Introduce MaxBatchSize for MultiQueries

rngcntr commented 3 years ago

MultiQueries are useful to group backend queries instead of sending them one-by-one and suffering a round-trip penalty for each query. This, however, can lead to a huge overhead for queries with defined limit(). As an example, consider the following profile:

gremlin> g.V(41136352).in().in().in().limit(5000).count().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
GraphStep(vertex,[41136352])                                           1           1           3.125     0.10
JanusGraphVertexStep(IN,vertex)                                      973         973          14.248     0.46
    \_condition=(EDGE AND visibility:normal)
    \_isFitted=false
    \_vertices=1
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8009c75e
    \_orders=[]
    \_isOrdered=true
    \_multi=true
  optimization                                                                                 0.034
  backend-query                                                      973                      10.705
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8009c75e
JanusGraphVertexStep(IN,vertex)                                    17514       17514         242.895     7.80
    \_condition=(EDGE AND visibility:normal)
    \_isFitted=false
    \_vertices=973
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8009c75e
    \_orders=[]
    \_isOrdered=true
    \_multi=true
  optimization                                                                                 0.044
  backend-query                                                    18467                     198.595
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@8009c75e
JanusGraphVertexStep(IN,edge)                                       5002        5002        2839.340    91.15
    \_condition=(EDGE AND visibility:normal)
    \_isFitted=false
    \_vertices=17514
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@9ee6f
    \_limit=5000
    \_orders=[]
    \_isOrdered=true
    \_multi=true
  optimization                                                                                 0.038
  backend-query                                                   124086                    2788.163
    \_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@9ee6f
    \_limit=10000
RangeGlobalStep(0,5000)                                             5000        5000           1.881     0.06
CountGlobalStep                                                        1           1           1.936     0.06
                                            >TOTAL                     -           -        3115.181        -

The profiled query performs three in() steps and checks if the result contains at least 5000 vertices. The first in() step yields a result set of 973 adjacent vertices. The second in() step takes all of these 973 vertices and constructs a MultiQuery against the backend, which returns 17514 vertices. Lastly, the third in() step sends these to the backend in an extremely large MultiQuery which then returns 124000 vertices, while only 5000 of those are actually needed.

Without MultiQueries, a Gremlin query like this would be able to terminate early without querying the backend for all the adjacent vertices, but would suffer a latency penalty for each small query. To combine the best of both worlds, a configurable batch size for MultiQueries would help to avoid loading way more entries than needed, while keeping the latency penalty low.

The question remains, whether the max size should be configured per JanusGraph instance, per query or even per step. A nice solution would be to use the barrier size of preceeding NoOpBarrierSteps to determine the batch size of the following VertexStep. Sadly, the NoOpBarrierStep does not make its maxBarrierSize public.

li-boxuan commented 3 years ago

Having such a config seems to be useful. Per janusgraph instance makes sense to me because we may generally don't want to fire super huge queries to storage backends.

rngcntr commented 3 years ago

As I said, an option would be to make the limit depend on previous NoOpBarrierSteps. They have a default limit of 2500 so we could make a global setting to either use this value for MultiQueries or ignore it (just as it is right now). I will prepare a PR for TinkerPop to make the maxBarrierSize of NoOpBarrierSteps public so that we can use it here.

rngcntr commented 3 years ago

Tinkerpop Issue #2532

porunov commented 3 years ago

@rngcntr Should we close this issue? Looks like it was fixed by your PR.

JanusGraph / janusgraph

Introduce MaxBatchSize for MultiQueries #2514