RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.14k stars 555 forks source link

N-triples parser not in line with N-triples specification #1835

Open ghost opened 2 years ago

ghost commented 2 years ago

Discussed in https://github.com/RDFLib/rdflib/discussions/1557

Originally posted by **csae8092** March 9, 2021 while trying to parse an n-triples with rdflib version 4.2.2 rdflib.Graph().parse(data=' .', format='nt') an error is thrown: ``` Traceback (most recent call last): File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 140, in parse self.parseline() File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 195, in parseline object = self.object() File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 228, in object objt = self.uriref() or self.nodeid() or self.literal() File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 235, in uriref uri = self.eat(r_uriref).group(1) File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 210, in eat raise ParseError("Failed to eat %s at %s" % (pattern.pattern, self.line)) rdflib.plugins.parsers.ntriples.ParseError: Failed to eat <([^:]+:[^\s"<>]+)> at . During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1043, in parse parser.parse(source, self, **args) File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/nt.py", line 26, in parse parser.parse(f) File "/usr/lib/python3/dist-packages/rdflib/plugins/parsers/ntriples.py", line 142, in parse raise ParseError("Invalid line: %r" % self.line) rdflib.plugins.parsers.ntriples.ParseError: Invalid line: ' .' ``` The traceback suggests the object-URI has to match the `<([^:]+:[^\s"<>]+)>` regex which is not in line with the n-triples specification (8th statement of the https://www.w3.org/TR/n-triples/#n-triples-grammar) which doesn't require the IRIREF to contain a semicolon and allows it to contain unicode escape sequences.
edmondchuc commented 2 years ago

I think it is not a bug.

The spec states that IRIs are absolute IRIs. https://www.w3.org/TR/n-triples/#h3_sec-iri

IRIs may be written only as absolute IRIs. IRIs are enclosed in '<' and '>' and may contain numeric escape sequences (described below). For example http://example.org/#green-goblin.

The value <make\\u0020me> is not an absolute IRI. I think the parser is correct in saying that it is an invalid value.

The W3C N-Triples test suite also provides 4 tests to ensure that relative IRIs are not allowed.

From the manifest:

<#nt-syntax-bad-uri-06> rdf:type rdft:TestNTriplesNegativeSyntax ;
   mf:name    "nt-syntax-bad-uri-06" ;
   rdfs:comment "Bad IRI : relative IRI not allowed in subject (negative test)" ;
   mf:action    <nt-syntax-bad-uri-06.nt> ;
   .

<#nt-syntax-bad-uri-07> rdf:type rdft:TestNTriplesNegativeSyntax ;
   mf:name    "nt-syntax-bad-uri-07" ;
   rdfs:comment "Bad IRI : relative IRI not allowed in predicate (negative test)" ;
   mf:action    <nt-syntax-bad-uri-07.nt> ;
   .

<#nt-syntax-bad-uri-08> rdf:type rdft:TestNTriplesNegativeSyntax ;
   mf:name    "nt-syntax-bad-uri-08" ;
   rdfs:comment "Bad IRI : relative IRI not allowed in object (negative test)" ;
   mf:action    <nt-syntax-bad-uri-08.nt> ;
   .

<#nt-syntax-bad-uri-09> rdf:type rdft:TestNTriplesNegativeSyntax ;
   mf:name    "nt-syntax-bad-uri-09" ;
   rdfs:comment "Bad IRI : relative IRI not allowed in datatype (negative test)" ;
   mf:action    <nt-syntax-bad-uri-09.nt> ;
   .