ArangoDB-Community / arangodb-tinkerpop-provider

An implementation of the Tinkerpop OLTP Provider for ArangoDB
Apache License 2.0
83 stars 16 forks source link

Generated AQL queries aren't optimal #43

Open fdominik opened 5 years ago

fdominik commented 5 years ago

The driver doesn't generate optimal AQL queries.

For example the easiest Gremlin query: g.V().hasLabel("person").next() In a graph, which has 3 vertex collections

Generates following AQL Query:

(FOR v in UNION( 
  (FOR x1 IN @@col1 RETURN x1),
  (FOR x2 IN @@col2 RETURN x2),
  (FOR x3 IN @@col3 RETURN x3  )
)
RETURN v
)

against db, with bind vars: {@col1=documents, @col2=person, @col3=prison} This will return ALL vertices in ArangoDB (which might be a significantly unperformant for huge collections), and then the hasLabel("person") is parsed in Java.

However it could be easily transformed to:

FOR v IN person
return v

Where only vertices from the requested collection are returned.

However it all depends if we are able to modify how gremlin executes the Steps (e.g. some Custom Traversal?) or if we are able to get the query into Arango driver... EDIT: it should e possible using: Traversal Strategies: A TraversalStrategy can be used to alter a traversal prior to its execution. A typical example is converting a pattern of g.V().has('name','marko') into a global index lookup for all vertices with name "marko". In this way, a O(|V|) lookup becomes an O(log(|V|)). Please review TinkerGraphStepStrategy for ideas.

fdominik commented 5 years ago

So I did a first attempt to implement ProviderTraversalStrategy, being inspired by https://github.com/apache/tinkerpop/blob/master/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/traversal/strategy/optimization/TinkerGraphStepStrategy.java and by https://github.com/JanusGraph/janusgraph/blame/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphStepStrategy.java

However I haven't fount, where the code to build the Arango query should be placed. Need to postpone my work on this issue on later, so someone can continue in the mean time...

it is at: https://github.com/fdominik/arangodb-tinkerpop-provider/commits/feature/43 contributor access on request.

arcanefoam commented 5 years ago

Hi, I think that for this type of optimization we would need to implement the OLAP api of Tinkerpop... or be smarter about how Graph::vertices retrieves all vertices from the db. On the later, my current line of thought is moving towards something similar to what you need to do when paginating results for a web app and some clever use of caches...

arcanefoam commented 5 years ago

I just understood the TraversalStrategy idea. Will look into it.