Open hos-omar opened 5 years ago
Thanks for bringing it out @hos-omar . I also verified the same thing against other two backend configuration as well, and the results aren't that promising. I tested against following setup/envs: I also followed your steps to reproduce, and scaled up the data size in graph to 500k vertices & edges approximately.
1: Cassandra + ES (Standalone) 2: HBase + Solr (Cluster)
I got following results with Cassandra + ES backend:
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).order().by('since', incr).and(inV().has('name', 'gremlin')).limit(10).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(develops), since.g... 4550 4550 2844.016 95.26
\_condition=(~label = develops AND since > 2005)
\_isFitted=true
\_query=[(since > 2005)]:by-since-mixed
\_index=by-since-mixed
\_orders=[]
\_isOrdered=true
\_index_impl=search
optimization 0.036
optimization 0.800
backend-query 0.000
\_query=by-since-mixed:[(since > 2005)]:by-since-mixed
backend-query 4550 571.202
\_query=by-since-mixed:[(since > 2005)]:by-since-mixed
AndStep([[EdgeVertexStep(IN), ProfileStep, HasS... 4548 4548 99.069 3.32
EdgeVertexStep(IN) 4550 4550 4.813
HasStep([name.eq(gremlin)]) 80.334
OrderGlobalStep([[value(since), incr]]) 11 11 42.504 1.42
RangeGlobalStep(0,10) 10 10 0.070 0.00
>TOTAL - - 2985.661 -
I also tried the second query you posted:
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).order().by('since', incr).limit(Integer.MAX_VALUE).and(inV().has('name', 'gremlin')).limit(10).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(develops), since.g... 11 11 9.192 91.09
\_condition=(~label = develops AND since > 2005)
\_isFitted=true
\_query=[(since > 2005)][ASC(since)]:by-since-mixed
\_index=by-since-mixed
\_orders=[ASC(since)]
\_isOrdered=true
\_index_impl=search
optimization 0.031
optimization 0.815
backend-query 0.000
\_query=by-since-mixed:[(since > 2005)][ASC(since)]:by-since-mixed
AndStep([[EdgeVertexStep(IN), ProfileStep, HasS... 11 11 0.759 7.52
EdgeVertexStep(IN) 11 11 0.011
HasStep([name.eq(gremlin)]) 0.326
RangeGlobalStep(0,10) 10 10 0.139 1.38
>TOTAL - - 10.091 -
I don't know what would be your use case, and why would you want to have an orderBy()
in between query whereas you can apply orderBy()
at the very end just before limit(10)
and that will give you same result. So I tried the same using orderBy()
at the end, and the results weren't promising either:
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).and(inV().has('name', 'gremlin')).order().by('since', incr).limit(10).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(develops), since.g... 4550 4550 2644.216 94.88
\_condition=(~label = develops AND since > 2005)
\_isFitted=true
\_query=[(since > 2005)]:by-since-mixed
\_index=by-since-mixed
\_orders=[]
\_isOrdered=true
\_index_impl=search
optimization 0.027
optimization 0.693
backend-query 0.000
\_query=by-since-mixed:[(since > 2005)]:by-since-mixed
backend-query 4550 629.469
\_query=by-since-mixed:[(since > 2005)]:by-since-mixed
AndStep([[EdgeVertexStep(IN), ProfileStep, HasS... 4548 4548 95.752 3.44
EdgeVertexStep(IN) 4550 4550 4.821
HasStep([name.eq(gremlin)]) 76.830
OrderGlobalStep([[value(since), incr]]) 11 11 46.997 1.69
RangeGlobalStep(0,10) 10 10 0.079 0.00
>TOTAL - - 2787.045 -
The results are more or the less similar to 1st scenario.
If we check against the HBase + Solr backends in clustered mode, here I added around 1 million nodes and edges and following are results in this scenario:
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).order().by('since', incr).limit(Integer.MAX_VALUE).and(inV().has('name', 'gremlin')).limit(10).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep([],[~label.eq(develops), since.g... 11 11 605.897 93.40
\_condition=(~label = develops AND since > 2005)
\_isFitted=true
\_query=[(since_i > 2005)][ASC(since_i)]:by-since-mixed
\_index=by-since-mixed
\_orders=[ASC(since)]
\_isOrdered=true
\_index_impl=search
optimization 0.050
optimization 36.490
backend-query 0.000
\_query=by-since-mixed:[(since_i > 2005)][ASC(since_i)]:by-since-mixed
AndStep([[EdgeVertexStep(IN), ProfileStep, HasS... 11 11 41.824 6.45
EdgeVertexStep(IN) 11 11 0.597
HasStep([name.eq(gremlin)]) 37.734
RangeGlobalStep(0,10) 10 10 0.990 0.15
>TOTAL - - 648.712 -
As for rest of queries, which wasn't using index:
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).and(inV().has('name', 'gremlin')).order().by('since', incr).limit(10).profile()
gremlin> // and
gremlin> g.E().hasLabel('develops').has('since', gt(2005)).order().by('since', incr).and(inV().has('name', 'gremlin')).limit(10).profile()
They didn't even complete after waiting for more than 10 mins or so. I got multiple warning of Zookeeper client got disconnected (Which usually means that Zookeeper gets OOM due to huge number of records its processing in memory - 1million, can be rectified by increasing JVM mem), so had to kill the queries.
Since we are seeing repetitive similar behavior across different sources, it makes me believe this is an issue from JanusGraph side, and not due to the way query is serialized and de-serialized for ES backend.
I can self assign this issue to me, and start preparing a patch for the same, but I'm not sure if it will go into 0.3.x line of release or 0.4.x.
@pluradj : Can u confirm if any new fixes are planned to be merged into 0.3.x branch or will be going into 0.4.x? If 0.3.2 EoL for 0.3.x series of release of is another 0.3.3 planned? Also, I've dropped you an email, please have a look.
Can u confirm if any new fixes are planned to be merged into 0.3.x branch or will be going into 0.4.x? If 0.3.2 EoL for 0.3.x series of release of is another 0.3.3 planned?
I can answer this question as well. The last version in 0.3.x
series will be 0.3.3
release. After that no more contributions will be accepted to v0.3
branch. Active branches will be v0.4
branch and master
branch (for v0.5.x
series).
The version 0.3.3
isn't released yet. I think it will be released at the end of this month or the beginning of October.
See bugfixes which are already merged into v0.3
branch but not yet released here: https://github.com/JanusGraph/janusgraph/milestone/14?closed=1
Edge based queries aren't generally recommended. JanusGraph prefers vertex-centric queries so you can take advantage of vertex-centric indices which would better handle the ordering your query is trying to do.
I believe the limit()
step is acting as a barrier()
step which allows the optimizer to use the index. You can verify and compare the execution plans of your different queries by using the explain()
step.
Edge based queries aren't generally recommended. JanusGraph prefers vertex-centric queries so you can take advantage of vertex-centric indices which would better handle the ordering your query is trying to do.
I believe the
limit()
step is acting as abarrier()
step which allows the optimizer to use the index. You can verify and compare the execution plans of your different queries by using theexplain()
step.
Thanks a lot Jason for checking this. If we used VCIs in this use case wouldn't that require the traversal to iterate over all person vertices that have that edge, scan the VCI index of each of them, then order all matching edges to get the first N? seems it can be slower, but I need to give it a try.
Also in the above 2 queries, the index was used in both of them. But in the 1st query (non-optimized) the traversal decided not to search elastic search with order. This caused the traversal to have to load all matches from ES then sort them which is inefficient. I think in general if the traversal is already ordering the results, it should be always efficient to search ES with that order. This allows the results to be streamed to the subsequent steps with lower latency.
Interested to know your thoughts.
When using an elasticsearch mixed index on an orderable edge property (Integer) and ordering by it, the order is not used in the index query when it's followed by
and(...)
and the traversal ends up iterating all edges (to order them?) which slows down the query drastically. In the profile results I see\_orders=[]
Interestingly if I just add a
limit(Integer.MAX_VALUE)
(semantically useless) after theorder().by()
steps, it causes the index query to search with order (as should have happened in the first query too) which significantly improves the query performance (82x times faster). In the profile results I see\_orders=[ASC(since)]
Setup
Steps to Reproduce
Profile Results