eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
361 stars 163 forks source link

Inconsistent getStatements Results Starting In RDF4j 3.0 in NativeStore Only #3156

Open daltontc opened 3 years ago

daltontc commented 3 years ago

I am seeing inconsistent results in getStatements when trying to retrieve graphs using the context IRI. Based on the order of certain operations, when I retrieve a graph, the returning results differ in size. Subsequent attempts have result that vary as well.

This is only happening when I use a NativeStore. When I use a MemoryStore, things behave as expected. This also only started happening starting in RDF4j 3.0.0 and is still occurring in 3.7.1. It did not occur in 2.5.5.

I created a branch on this test project to demonstrate what is happening. https://github.com/daltontc/rdf4jTest/tree/bug/inconsistent_getStatements

Here is the relevant part. Something about the order of retrieving statements, clearing a graph, adding statements, retrieving statements and doing a hasNext() appears to be what triggers the inconsistent results. Did something change in the RepositoryConnection between 2.5.5 and 3.0.0?

try (RepositoryConnection conn = repo.getConnection()) {
            Model startModel = new LinkedHashModel();
            startModel.add(subcont, RDF.TYPE, someClass);
            startModel.add(subcont, RDF.TYPE, VALUE_FACTORY.createIRI("urn:someParentClass"));

            conn.add(startModel, subcont);

            System.out.println("startModel: " + startModel.size());
            printModel(startModel);

            conn.getStatements(subcont, RDF.TYPE, someClass);

            conn.clear(subcont);
            conn.add(startModel, subcont);

            conn.getStatements(subcont, RDF.TYPE, someClass).hasNext();

            Model modelFromContext = QueryResults.asModel(conn.getStatements(null, null, null, subcont));
            System.out.println("modelFromContext: " + modelFromContext.size());
            printModel(modelFromContext);

            Model modelFromSubject = QueryResults.asModel(conn.getStatements(subcont, null, null));
            System.out.println("modelFromSubject: " + modelFromSubject.size());
            printModel(modelFromSubject);

            Model modelFromContext2 = QueryResults.asModel(conn.getStatements(null, null, null, subcont));
            System.out.println("modelFromContext2: " + modelFromContext2.size());
            printModel(modelFromContext2);

            Model modelFromSubject2 = QueryResults.asModel(conn.getStatements(subcont, null, null));
            System.out.println("modelFromSubject2: " + modelFromSubject2.size());
            printModel(modelFromSubject2);
        }

Results below. You can see that when the statements are retrieved by the graph IRI then the results are not correct, despite the correct triples still being in the store as see when retrieving the statements by the subject.

startModel: 2
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someClass) [null]
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someParentClass) [null]
modelFromContext: 1
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someParentClass, urn:test) [urn:test]
modelFromSubject: 2
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someClass, urn:test) [urn:test]
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someParentClass, urn:test) [urn:test]
modelFromContext2: 0
modelFromSubject2: 2
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someClass, urn:test) [urn:test]
        (urn:test, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, urn:someParentClass, urn:test) [urn:test]
abrokenjester commented 3 years ago

Thanks for the report, and the code demonstrating the issue - very thorough. I'll try and reproduce this locally tonight, but it definitely looks suspicious from what you describe. I'm not immediately aware of any fix in 3.0.0 that might have caused this, though of course we did do several changes in the native store.

abrokenjester commented 3 years ago

The problem appears to be related to "dangling" iterators in the code. Specifically, when commenting out either the line conn.getStatements(subcont, RDF.TYPE, someClass); or conn.getStatements(subcont, RDF.TYPE, someClass).hasNext(); the code does produce the expected result. Both lines introduce unclosed orphaned iterators. What is odd is that one orphaned iterator seems to be fine. Only when both those lines are active does the discrepancy in the output occur.

It also appears to be necessary, minimally, to call the hasNext() on the second iterator.

Further finds: the way the clear(graph) operation is handled appears to influence this. If we replace with a clear() (without a named graph param) the output becomes correct again.

abrokenjester commented 3 years ago

Independently of anything else, a simple workaround is to make sure that you close your dangling RepositoryResult iterations properly.

abrokenjester commented 3 years ago

As far as I've been able to trace this down the cause seems to be in how UnionSailSource handles flushing of deprecated contexts. It may be somewhat related to https://github.com/eclipse/rdf4j/issues/1795.

A quick fix is likely not easy - we should probably investigate getting rid of UnionSailSource completely and instead having inferred/explicit flag as first class citizens.

daltontc commented 3 years ago

Is this something that you would foresee fixing in an (relatively soon) upcoming release?

abrokenjester commented 3 years ago

Is this something that you would foresee fixing in an (relatively soon) upcoming release?

Honestly, not really, for various reasons:

  1. It is an edge case in the sense that it can only be caused by a very specific set of circumstances involving various orphaned/dangling open connections and iterations. There is an easy workaround fix, which is to follow recommended usage patterns, making sure your result iterations are always consumed/closed correctly.
  2. as mentioned above, a fix would involve quite a substantial rewrite of a large part of the SailSource codebase. While this would be a good idea to do (for the purpose of fixing this bug as well as several other issues), it will be hard to find the time needed.

So, yes, we will fix it, but probably not soon.

abrokenjester commented 3 years ago

@hmottestad does this have your interest by any chance? I know you have previously looked into getting rid of UnionSailSource for SHACL-related reasons, this is another data point in favor of that. I'm just not sure where this sits on your priority list.

hmottestad commented 3 years ago

The union sail issues were related to reasoning. Some of the tests in the ShaclSail merely exacerbated the issue.

Does this issue here use reasoning?

hmottestad commented 3 years ago

Also, my understanding from the code is that it uses implicit transactions and leaves resources open after committing those transactions. So there is some leak between transactions. Is this possible to reproduce between connections too? Eg. if you leave resources open after committing a transaction on one connection, does this impact what the other connections sees?

abrokenjester commented 3 years ago

The union sail issues were related to reasoning. Some of the tests in the ShaclSail merely exacerbated the issue.

It may not be the exact same root cause, but I pinged you because I know you had an interest in looking at removing the need for UnionSailSource and instead feeding the inferred parameter straight in.

Does this issue here use reasoning?

It does not, but the UnionSailSource is also in play when no actual reasoning is involved - as long as your queries ask to include inferred triples it will use the union at some point.

In this case, as I described above, the problem seems to be a combination of how a clear operation on a context is handled combined with the branching that goes on due to there being several open/dangling iterations. As far as I've been able to debug the problem, it has to do with how UnionSailSource handles a flush operation: it flushes one branch but not the other (by design) but in this specific case that leads to some updates being dropped/out of sync.