RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.11k stars 547 forks source link

Bug in `longturtle` serialization #2767

Closed mschiedon closed 1 month ago

mschiedon commented 2 months ago

The longturtle serializer fails to emit a whitespace separator between a predicate and a list of objects if one of these objects is a blank node (and the blank node cannot be 'inlined', i.e. is used more than once). The problem can be reproduced using this Python code:

from rdflib import Graph

input = '''\
@prefix ex: <https://example.org/> .

ex:1 a ex:Thing ;
    ex:relatedTo ex:3, _:bnode0 .

ex:2 a ex:Thing ;
    ex:relatedTo _:bnode0 .

_:bnode0 a ex:Thing .
'''

graph = Graph().parse(data=input, format='turtle')
output = graph.serialize(format='longturtle')
print(output)
assert output.find('relatedTo_:') == -1, \
    'Missing whitespace separation between predicate' \
    ' and the first blank node of a list of objects.'

The resulting Turtle with the bug looks like below. Note the missing space between the predicate ex:relatedTo and blank node _:n40fef3a41a034be9a7116df126afd613b1 for the ex:1 case. The ex:2 case does correctly use a space separator when serializing because it's a single object and not a list.

PREFIX ex: <https://example.org/>

ex:1
    a ex:Thing ;
    ex:relatedTo_:n40fef3a41a034be9a7116df126afd613b1 ,
        ex:3 ;
.

ex:2
    a ex:Thing ;
    ex:relatedTo _:n40fef3a41a034be9a7116df126afd613b1 ;
.

_:n40fef3a41a034be9a7116df126afd613b1
    a ex:Thing ;
.

I believe the issue might be solved by adding an additional indent in the longturtle.py source code on this line, as shown in the code below.

    def objectList(self, objects):
        count = len(objects)
        if count == 0:
            return
        depthmod = (count == 1) and 0 or 1
        self.depth += depthmod
        first_nl = False
        if count > 1:
            if not isinstance(objects[0], BNode):
                self.write("\n" + self.indent(1))
                # BUG: Gave below line an extra indent.
                first_nl = True
        self.path(objects[0], OBJECT, newline=first_nl)
        for obj in objects[1:]:
            self.write(" ,")
            if not isinstance(obj, BNode):
                self.write("\n" + self.indent(1))
            self.path(obj, OBJECT, newline=True)
        self.depth -= depthmod
nicholascar commented 1 month ago

I think this issue has been addressed by PR https://github.com/RDFLib/rdflib/pull/2700 but that fix is currently only in the HEAD of this repo, not an RDFlib release yet. It should appear in 7.0.1 or 7.1.0 in the next few weeks when we make that release which will fix a bunch of small things.

mschiedon commented 1 month ago

I think this issue has been addressed by PR #2700 but that fix is currently only in the HEAD of this repo, not an RDFlib release yet. It should appear in 7.0.1 or 7.1.0 in the next few weeks when we make that release which will fix a bunch of small things.

Excellent, thank you! I can confirm this addresses the issue. Looking forward to the next rdflib release then 👍