Open keiligch opened 1 year ago
Did you try adjusting the timeout?
Yes, but 120 seconds timeout seems more than sufficient for the small graph and relatively simple query to evaluate. The other queries were evaluated in fractions of a second.
Does it still timeout when setting it to 0?
Yes, it throws after about 20s the TimeoutException and the QueryInterruptedException
Does ist also happen with NativeStore or LmdbStore?
Yes, with both. No difference
Try setting the timeout directly on the Query object. Like this query.setMaxExecutionTime(0);
.
Can you also update your example code to show how you are setting the timeout on the FedX repo?
I had the timeout setting as you wrote on the Query object and did not change it globally. Also the example code repo is updated
were you able to reproduce with my example code?
I haven't had a chance to test it out yet. Might be a little while before I have time.
I've created a test to show that the issue only happens against remote endpoints. If EndpointClassification.Local
is used then the queries are working as expected.
I also added some "debug" output to ControlledWorkerUnion
but I'm not sure if this helps to fix the problem.
@aschwarte10 do you have some time to look into this issue? I think it is critical as it leads to deadlocks when used with remote endpoints.
Thanks for reporting the issue. It looks like you have stumbled indeed into a deadlock scenario (which fortunately at least recovers through timeout exceptions)
Unfortunately I only have very limited time available to work on RDF4J.
However, I tried to do a bit of an assessment, and have some findings to nail it down (but do not yet have an understanding of the cause). I have also pushed a PR to reproduce it with just 40 triples, so ideally this allows to debug it easier.
Some insights to share:
The issue occurs in a join with an NUnion node, where the statements of the union themselves are relevant at multiple sources.
An example FedX query plan looks like this:
QueryRoot
Projection
ProjectionElemList
ProjectionElem "person"
ProjectionElem "label"
NJoin
ExclusiveStatement
Var (name=person)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
StatementSource (id=repo1, type=REMOTE)
NUnion
StatementSourcePattern
Var (name=person)
Var (name=_const_9285ccfc_uri, value=http://www.w3.org/2000/01/rdf-schema#label, anonymous)
Var (name=label)
StatementSource (id=repo1, type=REMOTE)
StatementSource (id=repo2, type=REMOTE)
StatementSourcePattern
Var (name=person)
Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
Var (name=label)
StatementSource (id=repo1, type=REMOTE)
StatementSource (id=repo2, type=REMOTE)
Note: due to the NUnion the join is executed in ControlledWorkerJoin
. (i.e. no vectored / bind join evaluation).
It looks like the scheduler executing the unions is populated in some order that causes the deadlock, i.e. the results cannot be piped through. Might also be the main executor that blocks.
Note: the design idea is that everything scheduled in the UNION scheduler can executed and does not have dependencies, while the join scheduler may have dependencies and block.
If somebody finds the time to look further, please go ahead. I will try to look further once my time allows
Current Behavior
When using a FedX-Repository: Specific query and graph (see repo linked in "Steps To Repoduce") and
((FedXTupleQuery) query).evaluate().stream().collect(Collectors.toList());
(or((FedXTupleQuery) query).evaluate().hasNext();
) throwsorg.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout
caused byjava.util.concurrent.TimeoutException
Expected Behavior
providing a correct query result with a
List<BindingSet>
Steps To Reproduce
clone FedX-Query-Test-Repo
run
Main.main()
get Exception at line 101
(delete lines 8-48 in
db.ttl
, then the bug will not occur)Version
4.3.6
Are you interested in contributing a solution yourself?
None
Anything else?