JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.31k stars 1.17k forks source link

P.within on an indexed property does not use the index #643

Open aelred opened 7 years ago

aelred commented 7 years ago

For example, the Gremlin traversal:

g.V().has("INDEXED_PROPERTY", P.within("foo", "bar"));

takes time linear in the length of the graph, whereas:

g.V().has("INDEXED_PROPERTY", "foo");

is very fast.

It seems like the former query could use the index to improve performance.

robertdale commented 7 years ago

Can you show the output of g.V().has("INDEXED_PROPERTY", P.within("foo", "bar")).profile()?

aelred commented 7 years ago

Sure!

Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[VALUE_STRING.within([foo, ba...                     1           1           8.248   100.00
  optimization                                                                                 1.487
  backend-query                                                        1                       3.558
                                            >TOTAL                     -           -           8.248        -

In this case I'm using VALUE_STRING as the property (since it's indexed on the graph I have on hand).

And for completeness, when just doing has("VALUE_STRING", "foo"):

Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[VALUE_STRING.eq(foo)])                              1           1           0.435   100.00
  optimization                                                                                 0.207
  backend-query                                                        1                       0.066
                                            >TOTAL                     -           -           0.435        -]

These are both ran against a graph with no edges and 100,000 vertices each with a random VALUE_STRING property (one of which being foo).

davidclement90 commented 7 years ago

Do you use composite or mixed index ?

aelred commented 7 years ago

We're using a composite index.

robertdale commented 7 years ago

The 0.1 profile isn't too helpful. This may already be fixed in 0.2. Here you can see index being used (at least on an indexOnly on label).

gremlin> g.V().has('Foo','fooId', within('1004260','1026042')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(Foo), fooId.wi...                     2           2           1.501   100.00
    \_condition=(~label = Foo AND (fooId = 1004260 OR fooId = 1026042))
    \_isFitted=true
    \_query=multiKSQ[2]@2147483647
    \_index=Foo.fooId
    \_orders=[]
    \_isOrdered=true
  optimization                                                                                 0.259
                                            >TOTAL                     -           -           1.501        -
gremlin> 
robertdale commented 7 years ago

Just created an index without indexOnly on label and the query uses the new index.

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[fooId.within([1004260, 102...                     2           2           1.394   100.00
    \_condition=((fooId = 1004260 OR fooId = 1026042))
    \_isFitted=true
    \_query=multiKSQ[2]@2147483647
    \_index=mmmmmIdx
    \_orders=[]
    \_isOrdered=true
  optimization                                                                                 0.427
  backend-query                                                        2                       0.777
    \_query=mmmmmIdx:multiKSQ[2]@2147483647
                                            >TOTAL                     -           -           1.394        -