ftsrg / ingraph

Incremental view maintenance for openCypher graph queries.
http://docs.inf.mit.bme.hu/ingraph/
Eclipse Public License 1.0
47 stars 10 forks source link

Compile path variables #301

Open jmarton opened 6 years ago

jmarton commented 6 years ago

Path variable assignments need to be handled in the compiler, like the following excerpt from a MATCH clause:

p = shortestPath((fun)-[*]->(`call`))

This is currently leading to IncompleteCompilationException.

brncsk commented 4 years ago

@szarnyasg Your slides from last year's Budapest DB Meetup indicate that ingraph supports paths but this issue seems to contradict that. Can you please elaborate on the current state of paths in ingraph? Thanks!

szarnyasg commented 4 years ago

Hello @brncsk, there are two systems here: the ingraph's incremental in-memory engine and the Cypher-to-(Postgre)SQL transpiler presented in the meetup slides. Unfortunately, neither of them received much attention during this year as we were focused on improving the LDBC benchmark suites. Anyways:

What sort of system are you looking for?

brncsk commented 4 years ago

@szarnyasg Thank you for the prompt and in-depth reply! As a matter of fact, we are in the process of migrating away from AgensGraph (its OSS version was deemed unsuitable for production use) and towards vanilla PostgreSQL, so we're mostly interested in the cypher-to-sql transpiler.

The goal is to maintain (at least part of) the developer experience provided by the Cypher+SQL mix in AgensGraph. Our data is pretty low volume (n*1e4 low-degree nodes) at the moment, with queries mostly focusing on matching variable length paths1 instead of solving shortest path problems. One important challenge that remains is that we need path variables for accessing members of the matched paths (covered by #175, citing this issue as its blocker).

Am I correct in assuming that cypher-to-sql does not currently support this use case? If so: how would you rate the technical difficulties of implementing this feature?

1 One such query would be:

    MATCH p=(s:A)-[:contains*0..]->(i:B)
    WHERE id(s) = ?
    RETURN DISTINCT UNNEST(nodes(p))
szarnyasg commented 4 years ago

We've continuously had trouble with extracting good performance from transpiler SQL queries. The reasons behind this are manyfold (Postgres' optimization fence, too many subqueries, the difficulty of cardinality estimation for graphs, etc.). We will pick up this line of work in 2020 but that's definitely not something you can build on. What's worse, some of these limitations are fundamental when using classical relational databases. Such a problem is the lack of multiway (n-ary) joins which makes evaluating cyclic queries (and, consequently, dense subgraph queries) very expensive. Fixing these isn't possible without touching the code of the DBMS. (See the GRFusion paper which added graph features to VoltDB.)

How about Neo4j and Oracle PGX?