JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.28k stars 1.17k forks source link

Add batch query support for drop step [tp-tests] #4456

Open porunov opened 4 months ago

porunov commented 4 months ago

Benchmarks:

CQLMultiQueryDropBenchmark.dropVertices                                                                                       N/A           true                          N/A              5000  avgt    5     98.069 ±   24.496  ms/op
CQLMultiQueryDropBenchmark.dropVertices                                                                                       N/A          false                          N/A              5000  avgt    5   1451.917 ±   90.617  ms/op
CQLMultiQueryDropBenchmark.dropVerticesGremlinQuery                                                                           N/A           true                          N/A              5000  avgt    5    106.444 ±   38.290  ms/op
CQLMultiQueryDropBenchmark.dropVerticesGremlinQuery                                                                           N/A          false                          N/A              5000  avgt    5   1452.070 ±  130.414  ms/op

Conclusion: same as with direct usage of multi-query the usage of Gremlin drop() step benefits from re-using multi-query capabilities. Performance of drop query processing improved more than 13 times.

TODO:

Related https://github.com/apache/tinkerpop/pull/2609 Related https://github.com/apache/tinkerpop/pull/2612


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

For all changes:

For code changes:

For documentation related changes:

porunov commented 4 months ago

Currently I'm stuck on the following two TinkerPop tests:

Error:  Failures: 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_E_propertiesXweightX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
[INFO] 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_V_properties_propertiesXstartTimeX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<2>

The first test checks that drop() works for Edge properties. The second one checks that drop() works for meta properties. For some reason those two TinkerPop tests are failing with this PR. However, I verified that the removal operation works in this PR by adding testMetaPropertiesDrop and testEdgePropertiesDrop tests (similarly as in those TinkerPop tests). It can be seen that these added tests are passing. For now the reason for these two TinkerPop tests to fail is unknown to me. Additional pair of eyes would me much appreciated.

TinkerPop tests for inmemory database can be triggered with:

mvn clean install --projects janusgraph-inmemory -DskipTests=true --batch-mode --also-make

mvn verify --projects janusgraph-inmemory -Dtest.skip.tp=false -Dtest=org.janusgraph.blueprints.inmemory.process.InMemoryMultiQueryJanusGraphProcessTest -pl janusgraph-inmemory
porunov commented 4 months ago

One of the observations is that if I change the default barrier step size from the default 2500 to 1 (query.batch.limited-size=1) then all tests passes. I.e. it means that the fact that we store those properties in the barrier before we actually remove them changes the result of those two tests.

ntisseyre commented 4 months ago

Currently I'm stuck on the following two TinkerPop tests:

Error:  Failures: 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_E_propertiesXweightX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
[INFO] 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_V_properties_propertiesXstartTimeX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<2>

The first test checks that drop() works for Edge properties. The second one checks that drop() works for meta properties. For some reason those two TinkerPop tests are failing with this PR. However, I verified that the removal operation works in this PR by adding testMetaPropertiesDrop and testEdgePropertiesDrop tests (similarly as in those TinkerPop tests). It can be seen that these added tests are passing. For now the reason for these two TinkerPop tests to fail is unknown to me. Additional pair of eyes would me much appreciated.

TinkerPop tests for inmemory database can be triggered with:

mvn clean install --projects janusgraph-inmemory -DskipTests=true --batch-mode --also-make

mvn verify --projects janusgraph-inmemory -Dtest.skip.tp=false -Dtest=org.janusgraph.blueprints.inmemory.process.InMemoryMultiQueryJanusGraphProcessTest -pl janusgraph-inmemory

I noticed, I am not sure if it is related, but you might have missed adding JanusGraphDropStep here: https://github.com/JanusGraph/janusgraph/blob/a2e3521d28de37ca01d08166777d77c0943a9db5/janusgraph-backend-testutils/src/main/java/org/janusgraph/graphdb/database/LazyLoadGraphTest.java#L40

porunov commented 4 months ago

I noticed, I am not sure if it is related, but you might have missed adding JanusGraphDropStep here:

JanusGraphDropStep is replaced as part of JanusGraphLocalQueryOptimizerStrategy. So I believe it should be fine. I feels that those two TinkerPop tests don't like that I'm storing properties for removal in the barrier step for some reason. However, I can't figure out "why" yet.

porunov commented 4 months ago

After a bit of debugging I found the root cause of the problem. When removing Edge properties I see that each edge property is wrapped into B_O_Traverser which is passed to NoOpBarrier step. That traverser is added to the TraverserSet here. After I loaded the same graph as in TinkerPop tests locally I noticed that 4 out of 6 edge properties were removed, but 2 didn't because they were considered the "same" properties as were already added into the TraverserSet (thus their Traverser's merged). The root cause is that relation is not checked on equality for SimpleJanusGraphProperty. This bug was introduced 9 years ago here: https://github.com/JanusGraph/janusgraph/commit/795d7b0131c1e62fdf354e0a8e4718d6627f1418

The fix should be a simple one and I will add a separate test case with barrier() + drop() of edge properties with the same key and value.

porunov commented 4 months ago

After a bit of debugging I found the root cause of the problem. When removing Edge properties I see that each edge property is wrapped into B_O_Traverser which is passed to NoOpBarrier step. That traverser is added to the TraverserSet here. After I loaded the same graph as in TinkerPop tests locally I noticed that 4 out of 6 edge properties were removed, but 2 didn't because they were considered the "same" properties as were already added into the TraverserSet (thus their Traverser's merged). The root cause is that relation is not checked on equality for SimpleJanusGraphProperty. This bug was introduced 9 years ago here: 795d7b0

The fix should be a simple one and I will add a separate test case with barrier() + drop() of edge properties with the same key and value.

Huh. Seems I was a bit wrong. If I fix that drop tests then dedup tests start to fail. Hmm. Now I'm confused where the fix should happen.

porunov commented 4 months ago

Seems I'm missing how barrier step suppose to work. It could be that we actually don't have any bug which I described above and it's actually intended to be like that (at least I verified that the behavior is consistent between TinkerGraph and JanusGraph, so I guess we are fine there). However, I think it makes sense to double check with the TinkerPop community regrading barrier and dedup behavior. Thus, I opened a thread here: https://discord.com/channels/838910279550238720/1241219665376706603

If that is confirmed to be the intended behavior then I will try to think about a bit different solution.

porunov commented 4 months ago

One of the potential solutions is to make out own NoOpBarrierStep which skips anything except Vertex and insert that barrier step as part of multi-query optimization instead of the standard NoOpBarrierStep. However, there are many places in code logic where we need to check and sometimes cast a step to NoOpBarrierStep. It would make sense to extend NoOpBarrierStep and overwrite part of the logic, but NoOpBarrierStep is currently final and the necessary properties are private which makes things a bit challenging. Potentially, when the following PR is merged and the next TinkerPop version is released it should be easier to extend logic: https://github.com/apache/tinkerpop/pull/2612

porunov commented 4 months ago

Both TinkerPop PRs are now merged (https://github.com/apache/tinkerpop/pull/2612 and https://github.com/apache/tinkerpop/pull/2609). This will allow us to extend NoOpBarrierStep to skip everything except vertices. Thus should be also an improvement by itself because there is no reason for multi-query strategy to inject barrier steps which trigger batching for anything except edges because we multi-query step registers Vertices only as for now, and thus anything else is just a potential waist of operations. When we replace it with our own barrier step it will allow us:

Now we need to wait for TinkerPop 3.7.3 to proceed without doing major hacks.

porunov commented 4 months ago

The PR is now ready for the review. All tests passed as can be seen from my branch (here).

Notes: