eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
367 stars 164 forks source link

Timeout while evaluating FedX-Query #4785

Open keiligch opened 1 year ago

keiligch commented 1 year ago

Current Behavior

When using a FedX-Repository: Specific query and graph (see repo linked in "Steps To Repoduce") and ((FedXTupleQuery) query).evaluate().stream().collect(Collectors.toList()); (or ((FedXTupleQuery) query).evaluate().hasNext();) throws org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout caused by java.util.concurrent.TimeoutException

Expected Behavior

providing a correct query result with a List<BindingSet>

Steps To Reproduce

  1. clone FedX-Query-Test-Repo

  2. run Main.main()

  3. get Exception at line 101

  4. (delete lines 8-48 in db.ttl, then the bug will not occur)

Version

4.3.6

Are you interested in contributing a solution yourself?

None

Anything else?

    at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
    at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
    at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
    at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
    at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
    at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
    Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
        ... 13 more
        Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
            ... 13 more
            Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                ... 13 more
                Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                    at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                    ... 3 more
                    Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                        at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                        at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
                        at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
                        at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
                        at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
                        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
                        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
                        ... 3 more
                        Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                            ... 13 more
                            Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                                ... 3 more
                                Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                    at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                                    at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
                                    at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
                                    at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
                                    at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
                                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
                                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
                                    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
                                    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
                                    ... 3 more
                                    Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                        ... 13 more
                                        Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                            ... 13 more
                                            Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                                ... 13 more
                                                Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                                    ... 13 more
                                                    Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                                        ... 13 more
                                                        Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                                            at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                                                            at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                                                            ... 3 more
                                                            Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
                                                                at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
                                                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
                                                                at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
                                                                at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
                                                                at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
                                                                at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
                                                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
                                                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
                                                                at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
                                                                at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
                                                                ... 3 more
                                                            Caused by: java.util.concurrent.TimeoutException
                                                                at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                                                at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                                                at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                                                ... 11 more
                                                        Caused by: java.util.concurrent.TimeoutException
                                                            at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                                            at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                                            at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                                            at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                                            ... 3 more
                                                    Caused by: java.util.concurrent.TimeoutException
                                                        at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                                        at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                                        at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                                        ... 11 more
                                                Caused by: java.util.concurrent.TimeoutException
                                                    at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                                    at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                                    at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                                    ... 11 more
                                            Caused by: java.util.concurrent.TimeoutException
                                                at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                                at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                                at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                                ... 11 more
                                        Caused by: java.util.concurrent.TimeoutException
                                            at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                            at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                            at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                            at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                            ... 11 more
                                    Caused by: java.util.concurrent.TimeoutException
                                        at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                        at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                        at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                        ... 11 more
                                Caused by: java.util.concurrent.TimeoutException
                                    at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                    at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                    at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                    ... 11 more
                            Caused by: java.util.concurrent.TimeoutException
                                at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                                at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                                at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                                ... 3 more
                        Caused by: java.util.concurrent.TimeoutException
                            at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                            at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                            at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                            at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                            ... 11 more
                    Caused by: java.util.concurrent.TimeoutException
                        at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                        at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                        at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                        ... 11 more
                Caused by: java.util.concurrent.TimeoutException
                    at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                    at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                    at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                    ... 3 more
            Caused by: java.util.concurrent.TimeoutException
                at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
                at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
                at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
                at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
                ... 11 more
        Caused by: java.util.concurrent.TimeoutException
            at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
            at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
            at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
            at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
            ... 11 more
    Caused by: java.util.concurrent.TimeoutException
        at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
        at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
        at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
        at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
        ... 11 more
Caused by: java.util.concurrent.TimeoutException
    at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
    at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
    at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
    at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
    ... 11 more

Process finished with exit code 1
hmottestad commented 1 year ago

Did you try adjusting the timeout?

https://rdf4j.org/documentation/programming/federation/

keiligch commented 1 year ago

Yes, but 120 seconds timeout seems more than sufficient for the small graph and relatively simple query to evaluate. The other queries were evaluated in fractions of a second.

hmottestad commented 1 year ago

Does it still timeout when setting it to 0?

keiligch commented 1 year ago

Yes, it throws after about 20s the TimeoutException and the QueryInterruptedException

kenwenzel commented 1 year ago

Does ist also happen with NativeStore or LmdbStore?

keiligch commented 1 year ago

Yes, with both. No difference

hmottestad commented 1 year ago

Try setting the timeout directly on the Query object. Like this query.setMaxExecutionTime(0);.

Can you also update your example code to show how you are setting the timeout on the FedX repo?

keiligch commented 1 year ago

I had the timeout setting as you wrote on the Query object and did not change it globally. Also the example code repo is updated

keiligch commented 1 year ago

were you able to reproduce with my example code?

hmottestad commented 1 year ago

I haven't had a chance to test it out yet. Might be a little while before I have time.

kenwenzel commented 1 year ago

I've created a test to show that the issue only happens against remote endpoints. If EndpointClassification.Local is used then the queries are working as expected.

I also added some "debug" output to ControlledWorkerUnion but I'm not sure if this helps to fix the problem.

@aschwarte10 do you have some time to look into this issue? I think it is critical as it leads to deadlocks when used with remote endpoints.

aschwarte10 commented 1 year ago

Thanks for reporting the issue. It looks like you have stumbled indeed into a deadlock scenario (which fortunately at least recovers through timeout exceptions)

Unfortunately I only have very limited time available to work on RDF4J.

However, I tried to do a bit of an assessment, and have some findings to nail it down (but do not yet have an understanding of the cause). I have also pushed a PR to reproduce it with just 40 triples, so ideally this allows to debug it easier.

Some insights to share:

The issue occurs in a join with an NUnion node, where the statements of the union themselves are relevant at multiple sources.

An example FedX query plan looks like this:

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "label"
      NJoin
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
            StatementSource (id=repo1, type=REMOTE)
         NUnion
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_9285ccfc_uri, value=http://www.w3.org/2000/01/rdf-schema#label, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)

Note: due to the NUnion the join is executed in ControlledWorkerJoin. (i.e. no vectored / bind join evaluation).

It looks like the scheduler executing the unions is populated in some order that causes the deadlock, i.e. the results cannot be piped through. Might also be the main executor that blocks.

Note: the design idea is that everything scheduled in the UNION scheduler can executed and does not have dependencies, while the join scheduler may have dependencies and block.

If somebody finds the time to look further, please go ahead. I will try to look further once my time allows