RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.13k stars 547 forks source link

Omitted @prefix rdf on Turtle serialization #1048

Closed daniel-dona closed 4 years ago

daniel-dona commented 4 years ago

When using ttl/turtle serialization, the rdf prefix is not in the prefixes/namespaces of the output, even if it's in the input data.

Also, adding graph.bind("rdf", Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")) doesn't make any difference.

Testing code: https://gist.github.com/daniel-dona/d2fda3669aecdf859454e072f4a36e43

Result:

ORIGINAL: 
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.  <---- !!!!
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix : <http://mapping.example.com/>.

:map_stoptimes_0 a rr:TriplesMap;
    rdfs:label "stoptimes".
:s_0 a rr:SubjectMap.
:map_stoptimes_0 rr:subjectMap :s_0.
:s_0 rr:template "http://transport.linkeddata.es/madrid/metro/stoptimes/{trip_id}-{stop_id}-{arrival_time}".
:pom_0 a rr:PredicateObjectMap.
:map_stoptimes_0 rr:predicateObjectMap :pom_0.
:pm_0 a rr:PredicateMap.
:pom_0 rr:predicateMap :pm_0.
:pm_0 rr:constant rdf:type. <---- !!!!

PROCESSED: 
@prefix : <http://mapping.example.com/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .

:map_stoptimes_0 a rr:TriplesMap ;
    rdfs:label "stoptimes" ;
    rr:predicateObjectMap :pom_0 ;
    rr:subjectMap :s_0 .

:pm_0 a rr:PredicateMap ;
    rr:constant rdf:type . <---- !!!!

:pom_0 a rr:PredicateObjectMap ;
    rr:predicateMap :pm_0 .

:s_0 a rr:SubjectMap ;
    rr:template "http://transport.linkeddata.es/madrid/metro/stoptimes/{trip_id}-{stop_id}-{arrival_time}" .
daniel-dona commented 4 years ago

Possible problem origin: https://github.com/RDFLib/rdflib/blob/6197531775801c61a620c8094327d145516a5ed7/rdflib/plugins/serializers/turtle.py#L258

hsolbrig commented 4 years ago

This is a real edge case. To the best I can tell the only time this will appear is when rdf:type is the only RDF namespace reference. If, for example, you add: :pm_0 rr:constant rdf:nil. to the above then everything works as advertised.

There are two possible fixes: 1) In the serializer line line referenced above, add a test that says "and node is not rdf:type or node is in the predicate position" 2) Fix the turtle serializer to emit the following:

:pm_0 a rr:PredicateMap ;
    rr:constant a . 

I weakly favor the latter. Others?

tgbugs commented 4 years ago

As @hsolbrig points out, this isn't unexpected behavior because there isn't any use of the rdf: namespace in the file since a does not count, but what you are seeing with that trailing rdf:type is definitely a bug. A temporary workaround to force inclusion of the rdf: namespace to be included anyway we have a workaround which is to add rdf to roundtrip_prefixes https://github.com/RDFLib/rdflib/blob/6197531775801c61a620c8094327d145516a5ed7/rdflib/plugins/serializers/turtle.py#L46 Here is an example of how I use roundtrip_prefixes to make sure that the empty namespace persists.

hsolbrig commented 4 years ago

Out of curiosity, is :pm_0 a a . valid Turtle? I guess the ultimate case would be a a a . ;-)

tgbugs commented 4 years ago

Apparently not, the parser complains about bad syntax (in both cases). Looks like a is only allowed in the predicate position.

hsolbrig commented 4 years ago

Just checked with Eric P (Mr. Turtle). He confirmed that it is only allowed in the predicate position: https://www.w3.org/TR/turtle/#grammar-production-verb

hsolbrig commented 4 years ago

FWIW -- if you need the rdflib 4.x behavior (I did in one situation) something like:

https://gist.github.com/hsolbrig/b9412e9557b47189bfcfba40089faf2b#file-gistfile1-txt

would work. You could also replace the TurtleSerializer with the TortoiseSerializer if you wanted this behavior globally - instead of:

tortoise.register()

you could

import rdflib.plugins.serializers.turtle
rdflib.plugins.serializers.turtle.TurtleSerializer = TurtleWithPrefixes
tgbugs commented 4 years ago

@hsolbrig the way I implemented roundtrip_prefixes has an undocumented feature, which is that if you set it to True it will roundtrip all the prefixes, so you don't need to do the Cornucopia dance, you could just do the following.

class TurtleWithPrefixes(TurtleSerializer):
    """ A turtle serializer that always emits prefixes """
    roundtrip_prefixes = True
hsolbrig commented 4 years ago

Ha! Missed that nuance. Of course -- that was what the second half of the test was. Thanks! I'll update my code accordingly.

nicholascar commented 4 years ago

So although there's another issue identified here (object use of rdf:type) I'm going to close this issue as this issue isn't really an issue.

Please just yell and scream if this needs to be re-opened.