apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.11k stars 652 forks source link

Inconsistent default graph handling in RIOT writers #2578

Closed Ostrzyciel closed 3 months ago

Ostrzyciel commented 4 months ago

Version

5.0.0

What happened?

To be honest, I don't know if this is a bug or I'm just horribly confused. I wrote a deserializer for Jena that outputs quads with "null" in the graph term position when it's a default graph. When I then tried to serialize such a dataset using RIOT's serializers, I've noticed that for JSON-LD 1.1 the serialization fails completely, while for others it works.

I dug deeper and found out that Jena has 3 ways to represent the default graph: null, Quad.defaultGraphIRI, and Quad.defaultGraphNodeGenerated. I read the javadoc of the latter two and honestly, I still don't fully understand what's the difference. I then checked how the different RIOT writer implementations handle the different flavors of default graphs.

The code is here (Apache 2.0): https://github.com/Ostrzyciel/jena-default-graphs/blob/main/jena-default-graphs/src/main/java/org/example/Main.java

Output is attached below.

N-Quads, RDF-PROTO, RDF-THRIFT serialize/deserialize the default graph correctly in all three variants. JSON-LD 1.1 fails with a null pointer exception for the null variant (easy fix). TriG fails silently and just outputs nothing for the null variant (also likely to be an easy fix).

All parsers output Quad.defaultGraphIRI which I guess makes it the "right" choice when implementing parsers.

What I don't know is if this is a bug at all or I'm just being confused. If this is a bug, I can happily fix the null handling in TriG and JSON-LD.

Relevant output and stacktrace

Empty output for null graph in TriG/pretty
Failed to write null graph in JSON-LD-11/pretty
org.apache.jena.shared.JenaException: Exception while writing JSON-LD 1.1
    at org.apache.jena.riot.writer.JsonLD11Writer.write$(JsonLD11Writer.java:123)
    at org.apache.jena.riot.writer.JsonLD11Writer.write(JsonLD11Writer.java:73)
    at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:261)
    at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:219)
    at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:158)
    at org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:207)
    at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:809)
    at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:754)
    at org.example.Main.main(Main.java:63)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.jena.graph.Node.isBlank()" because "node" is null
    at org.apache.jena.riot.system.JenaTitanium.resource(JenaTitanium.java:133)
    at org.apache.jena.riot.system.JenaTitanium.lambda$convert$0(JenaTitanium.java:60)
    at org.apache.jena.atlas.iterator.Iter$IterMap.lambda$forEachRemaining$0(Iter.java:432)
    at org.apache.jena.mem2.store.fast.FastArrayBunch$1.forEachRemaining(FastArrayBunch.java:164)
    at org.apache.jena.mem2.iterator.IteratorOfJenaSets.forEachRemaining(IteratorOfJenaSets.java:71)
    at org.apache.jena.atlas.iterator.Iter$IterMap.forEachRemaining(Iter.java:432)
    at org.apache.jena.atlas.iterator.Iter.forEachRemaining(Iter.java:927)
    at org.apache.jena.atlas.iterator.IteratorConcat.forEachRemaining(IteratorConcat.java:99)
    at org.apache.jena.riot.system.JenaTitanium.convert(JenaTitanium.java:49)
    at org.apache.jena.riot.writer.JsonLD11Writer.write$(JsonLD11Writer.java:91)
    ... 8 more

To serialize                       Format                   Status/deserialized
null graph                         TriG/pretty              empty output   
null graph                         JSON-LD-11/pretty        serialization failed
null graph                         N-Quads/utf-8            Quad.defaultGraphIRI
null graph                         RDF-PROTO                Quad.defaultGraphIRI
null graph                         RDF-THRIFT               Quad.defaultGraphIRI
Quad.defaultGraphNodeGenerated     TriG/pretty              Quad.defaultGraphIRI
Quad.defaultGraphNodeGenerated     JSON-LD-11/pretty        Quad.defaultGraphIRI
Quad.defaultGraphNodeGenerated     N-Quads/utf-8            Quad.defaultGraphIRI
Quad.defaultGraphNodeGenerated     RDF-PROTO                Quad.defaultGraphIRI
Quad.defaultGraphNodeGenerated     RDF-THRIFT               Quad.defaultGraphIRI
Quad.defaultGraphIRI               TriG/pretty              Quad.defaultGraphIRI
Quad.defaultGraphIRI               JSON-LD-11/pretty        Quad.defaultGraphIRI
Quad.defaultGraphIRI               N-Quads/utf-8            Quad.defaultGraphIRI
Quad.defaultGraphIRI               RDF-PROTO                Quad.defaultGraphIRI
Quad.defaultGraphIRI               RDF-THRIFT               Quad.defaultGraphIRI

Are you interested in making a pull request?

Yes

rvesse commented 3 months ago

I agree that this looks like a bug, a PR to fix this would be welcome

afs commented 3 months ago

Hi @Ostrzyciel - thank you for the detailed report. your example code works fine on my machine.

The JSON-LD is a plain old bug. The quad.isTriple() is not handled.

The three choices have slightly different meanings:

The last two are covered by Quad.isDefaultGraph(node). They are different (external/.internal) although only slightly so maybe it wasn't necessary to split the concepts.

The other cases need tracking down and checking.

The deserialized Quad.defaultGraphIRI is consequence of DatasetGraph.find, not the serialization/deserialization. The parser output goes to a StreamRDF where triples and quads have different handlers.

Ostrzyciel commented 3 months ago

@afs Thank you for the detailed answer, that actually explains a lot!

I've updated the code to read directly from StreamRDF instead of constructing a dataset – the results now are different:

Empty output for null graph in TriG/pretty Failed to write null graph in JSON-LD-11/pretty org.apache.jena.shared.JenaException: Exception while writing JSON-LD 1.1 at org.apache.jena.riot.writer.JsonLD11Writer.write$(JsonLD11Writer.java:123) at org.apache.jena.riot.writer.JsonLD11Writer.write(JsonLD11Writer.java:73) at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:261) at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:219) at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:158) at org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:207) at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:809) at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:754) at org.example.Main.main(Main.java:68) Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.jena.graph.Node.isBlank()" because "node" is null at org.apache.jena.riot.system.JenaTitanium.resource(JenaTitanium.java:133) at org.apache.jena.riot.system.JenaTitanium.lambda$convert$0(JenaTitanium.java:60) at org.apache.jena.atlas.iterator.Iter$IterMap.lambda$forEachRemaining$0(Iter.java:432) at org.apache.jena.mem2.store.fast.FastArrayBunch$1.forEachRemaining(FastArrayBunch.java:164) at org.apache.jena.mem2.iterator.IteratorOfJenaSets.forEachRemaining(IteratorOfJenaSets.java:71) at org.apache.jena.atlas.iterator.Iter$IterMap.forEachRemaining(Iter.java:432) at org.apache.jena.atlas.iterator.Iter.forEachRemaining(Iter.java:927) at org.apache.jena.atlas.iterator.IteratorConcat.forEachRemaining(IteratorConcat.java:99) at org.apache.jena.riot.system.JenaTitanium.convert(JenaTitanium.java:49) at org.apache.jena.riot.writer.JsonLD11Writer.write$(JsonLD11Writer.java:91) ... 8 more To serialize Format Status/deserialized null graph TriG/pretty empty output null graph JSON-LD-11/pretty serialization failed null graph N-Quads/utf-8 Quad.defaultGraphNodeGenerated null graph RDF-PROTO null graph null graph RDF-THRIFT triple instead of quad Quad.defaultGraphNodeGenerated TriG/pretty Quad.defaultGraphNodeGenerated Quad.defaultGraphNodeGenerated JSON-LD-11/pretty triple instead of quad Quad.defaultGraphNodeGenerated N-Quads/utf-8 Quad.defaultGraphNodeGenerated Quad.defaultGraphNodeGenerated RDF-PROTO triple instead of quad Quad.defaultGraphNodeGenerated RDF-THRIFT triple instead of quad Quad.defaultGraphIRI TriG/pretty Quad.defaultGraphNodeGenerated Quad.defaultGraphIRI JSON-LD-11/pretty triple instead of quad Quad.defaultGraphIRI N-Quads/utf-8 Quad.defaultGraphNodeGenerated Quad.defaultGraphIRI RDF-PROTO triple instead of quad Quad.defaultGraphIRI RDF-THRIFT triple instead of quad

So, to summarize (please correct me if I'm wrong here):

I will try to fix the JSON-LD and TriG issues in a PR.