Open daltontc opened 3 years ago
Thanks for the report, and the code demonstrating the issue - very thorough. I'll try and reproduce this locally tonight, but it definitely looks suspicious from what you describe. I'm not immediately aware of any fix in 3.0.0 that might have caused this, though of course we did do several changes in the native store.
The problem appears to be related to "dangling" iterators in the code. Specifically, when commenting out either the line conn.getStatements(subcont, RDF.TYPE, someClass);
or conn.getStatements(subcont, RDF.TYPE, someClass).hasNext();
the code does produce the expected result. Both lines introduce unclosed orphaned iterators. What is odd is that one orphaned iterator seems to be fine. Only when both those lines are active does the discrepancy in the output occur.
It also appears to be necessary, minimally, to call the hasNext()
on the second iterator.
Further finds: the way the clear(graph)
operation is handled appears to influence this. If we replace with a clear()
(without a named graph param) the output becomes correct again.
Independently of anything else, a simple workaround is to make sure that you close your dangling RepositoryResult iterations properly.
As far as I've been able to trace this down the cause seems to be in how UnionSailSource
handles flushing of deprecated contexts. It may be somewhat related to https://github.com/eclipse/rdf4j/issues/1795.
A quick fix is likely not easy - we should probably investigate getting rid of UnionSailSource completely and instead having inferred/explicit flag as first class citizens.
Is this something that you would foresee fixing in an (relatively soon) upcoming release?
Is this something that you would foresee fixing in an (relatively soon) upcoming release?
Honestly, not really, for various reasons:
So, yes, we will fix it, but probably not soon.
@hmottestad does this have your interest by any chance? I know you have previously looked into getting rid of UnionSailSource for SHACL-related reasons, this is another data point in favor of that. I'm just not sure where this sits on your priority list.
The union sail issues were related to reasoning. Some of the tests in the ShaclSail merely exacerbated the issue.
Does this issue here use reasoning?
Also, my understanding from the code is that it uses implicit transactions and leaves resources open after committing those transactions. So there is some leak between transactions. Is this possible to reproduce between connections too? Eg. if you leave resources open after committing a transaction on one connection, does this impact what the other connections sees?
The union sail issues were related to reasoning. Some of the tests in the ShaclSail merely exacerbated the issue.
It may not be the exact same root cause, but I pinged you because I know you had an interest in looking at removing the need for UnionSailSource and instead feeding the inferred
parameter straight in.
Does this issue here use reasoning?
It does not, but the UnionSailSource is also in play when no actual reasoning is involved - as long as your queries ask to include inferred triples it will use the union at some point.
In this case, as I described above, the problem seems to be a combination of how a clear operation on a context is handled combined with the branching that goes on due to there being several open/dangling iterations. As far as I've been able to debug the problem, it has to do with how UnionSailSource handles a flush operation: it flushes one branch but not the other (by design) but in this specific case that leads to some updates being dropped/out of sync.
I am seeing inconsistent results in getStatements when trying to retrieve graphs using the context IRI. Based on the order of certain operations, when I retrieve a graph, the returning results differ in size. Subsequent attempts have result that vary as well.
This is only happening when I use a NativeStore. When I use a MemoryStore, things behave as expected. This also only started happening starting in RDF4j 3.0.0 and is still occurring in 3.7.1. It did not occur in 2.5.5.
I created a branch on this test project to demonstrate what is happening. https://github.com/daltontc/rdf4jTest/tree/bug/inconsistent_getStatements
Here is the relevant part. Something about the order of retrieving statements, clearing a graph, adding statements, retrieving statements and doing a
hasNext()
appears to be what triggers the inconsistent results. Did something change in the RepositoryConnection between 2.5.5 and 3.0.0?Results below. You can see that when the statements are retrieved by the graph IRI then the results are not correct, despite the correct triples still being in the store as see when retrieving the statements by the subject.