Compile path variables - Githubissues

jmarton commented 6 years ago

Path variable assignments need to be handled in the compiler, like the following excerpt from a MATCH clause:

p = shortestPath((fun)-[*]->(`call`))

This is currently leading to IncompleteCompilationException.

brncsk commented 4 years ago

@szarnyasg Your slides from last year's Budapest DB Meetup indicate that ingraph supports paths but this issue seems to contradict that. Can you please elaborate on the current state of paths in ingraph? Thanks!

szarnyasg commented 4 years ago

Hello @brncsk, there are two systems here: the ingraph's incremental in-memory engine and the Cypher-to-(Postgre)SQL transpiler presented in the meetup slides. Unfortunately, neither of them received much attention during this year as we were focused on improving the LDBC benchmark suites. Anyways:

As stated in this issue, shortest paths are not supported in the ingraph compiler or in its runtime. (Note that shortestPath in Cypher is the unweighted shortest path problem so it can be computed using e.g. a bidirectional BFS.)
Plain paths, i.e. (a)-[:E]->(b) work in ingraph engine albeit are not as efficient as I'd like - they will not work on large graphs with high-degree nodes.
The DB meetup slides discuss the transpiler which uses the same compiler as ingraph. This code is on the cypher-to-sql - but this also didn't receive much attention in recent months.

What sort of system are you looking for?

There are now a couple of industry systems, the Neo4j (the O.G.), Oracle PGX and AgensGraph.
If you need to compute path queries incrementally, there are no declarative languages for that as of yet. I'd suggest looking into Frank McSherry's work on Differential Dataflow. He solved a complex LDBC query in a blogpost. This is super efficient but you'll need to learn the computation model and write Rust code.
If you need a relational database with graph query support, the GRFusion tool (system paper, demo paper) could be interesting. But keep in mind that this is a prototype.

brncsk commented 4 years ago

@szarnyasg Thank you for the prompt and in-depth reply! As a matter of fact, we are in the process of migrating away from AgensGraph (its OSS version was deemed unsuitable for production use) and towards vanilla PostgreSQL, so we're mostly interested in the cypher-to-sql transpiler.

The goal is to maintain (at least part of) the developer experience provided by the Cypher+SQL mix in AgensGraph. Our data is pretty low volume (n*1e4 low-degree nodes) at the moment, with queries mostly focusing on matching variable length paths¹ instead of solving shortest path problems. One important challenge that remains is that we need path variables for accessing members of the matched paths (covered by #175, citing this issue as its blocker).

Am I correct in assuming that cypher-to-sql does not currently support this use case? If so: how would you rate the technical difficulties of implementing this feature?

¹ One such query would be:

    MATCH p=(s:A)-[:contains*0..]->(i:B)
    WHERE id(s) = ?
    RETURN DISTINCT UNNEST(nodes(p))

szarnyasg commented 4 years ago

We've continuously had trouble with extracting good performance from transpiler SQL queries. The reasons behind this are manyfold (Postgres' optimization fence, too many subqueries, the difficulty of cardinality estimation for graphs, etc.). We will pick up this line of work in 2020 but that's definitely not something you can build on. What's worse, some of these limitations are fundamental when using classical relational databases. Such a problem is the lack of multiway (n-ary) joins which makes evaluating cyclic queries (and, consequently, dense subgraph queries) very expensive. Fixing these isn't possible without touching the code of the DBMS. (See the GRFusion paper which added graph features to VoltDB.)

How about Neo4j and Oracle PGX?

Neo4j and Cypher should work well for this scale of data set.
Oracle Labs' PGX prototype uses the PGQL query language which is somewhere between Cypher and SQL, i.e. the developer experience you're looking for. Of course, license restrictions may apply here.

ftsrg / ingraph

Compile path variables #301