RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Parsing of variables from Turtle syntax fails #1776

Closed severin-lemaignan closed 2 years ago

severin-lemaignan commented 2 years ago

(tested with rdflib 6.1.1)

The following code (that parses a Turtle statement containing a variable) fails:

from rdflib import Graph
list(Graph().parse(data="BASE <http://example.com/> ?var <p> <o> .", format="turtle"))

with the following error:

~/.local/lib/python3.8/site-packages/rdflib/plugins/parsers/notation3.py in variable(self, argstr, i, res)
   1273             varURI = self._store.newSymbol(self._baseURI + "#" + argstr[j:i])
   1274             if varURI not in self._variables:
-> 1275                 self._variables[varURI] = self._context.newUniversal(
   1276                     varURI, why=self._reason2
   1277                 )

AttributeError: 'NoneType' object has no attribute 'newUniversal'

Modifying L425 of parsers/notation3.py like that:

if openFormula is None:
   #...

solves the problem:

>>> from rdflib import Graph                                                                                                                                             
>>> list(Graph().parse(data="BASE <http://example.com/> ?var <p> <o> .", format="turtle"))                                                                               
[(rdflib.term.Variable('var'),
  rdflib.term.URIRef('http://example.com/p'),
  rdflib.term.URIRef('http://example.com/o'))]

Is there a reason why turtle syntax is excluded by the test on L425 of parsers/notation3.py?

ghost commented 2 years ago

As far as I can ascertain, your statement isn't valid Turtle -- both online checkers [1, 2] reject it and so does my local copy of venerable-but-still-usable RDFConvert [3].

Although the 2011 W3 Team Submission states:

All RDF written in Turtle should be usable inside the query language part of the SPARQL Protocol And RDF Query Language which uses a Turtle/N3 style syntax for the Triple patterns and for RDF triples in the CONSTRUCT clause. This allows using RDF written in Turtle to allow forming "queries by example", using the data to make an initial query which can then be edited to use variables where bindings are wanted.

In section 10, "Turtle compared to SPARQL", it also states that "SPARQL includes at least the following syntax that is not in Turtle" (my emphasis):

Variables are allowed in any part of the triple of the form ?name or $name

[1] https://www.easyrdf.org/converter [2] https://issemantic.net/rdf-converter [3] https://sourceforge.net/projects/rdfconvert/

The RDFLib test suite uses the same turtle test files as Dave Beckett's raptor so it's pretty solid, spec-wise.

HTH

severin-lemaignan commented 2 years ago

Thank you very much for your detailed answer. Indeed, variables are out-of-scope for turtle, so rdflib does have the correct behaviour here.

I have also noticed that the n3 syntax does parse the variables (as expected from the spec), so I can simply use that:

>>>
>>> list(Graph().parse(data="@base <http://example.com/>. ?var <p> <o> .", format="n3"))
[(rdflib.term.Variable('var'),
  rdflib.term.URIRef('http://example.com/p'),
  rdflib.term.URIRef('http://example.com/o'))]

Thanks!