RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Issue parsing Turtle with info URI #816

Closed scossu closed 2 years ago

scossu commented 6 years ago

I am trying to read a TTL stream and parse into a graph by replacing a public URI (http://ex.org/res/) with an internal URI (info:ns/). I want to do this using the tools provided by RDFLib rather than writing tedious and error-prone pattern matching code.

The following is a minimal example:

>>> from rdflib import URIRef, Graph
INFO:rdflib:RDFLib Version: 4.2.2
>>> g1 = Graph().parse(data='<http://ex.org/res/1> a <http://auth.edu/ns#Resource> .', format='turtle')
>>> ttl1 = g1.serialize(format='turtle', base='http://ex.org/res')
>>> print(ttl1.decode())
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

</1> a <http://auth.edu/ns#Resource> .

>>> g2 = Graph().parse(data=ttl1, format='turtle', publicID=URIRef('info:ns'))

The last line triggers the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/graph.py", line 1043, in parse
    parser.parse(source, self, **args)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1870, in parse
    p.loadStream(source.getByteStream())
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 434, in loadStream
    return self.loadBuf(stream.read())    # Not ideal
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 440, in loadBuf
    self.feed(buf)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 466, in feed
    i = self.directiveOrStatement(s, j)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 487, in directiveOrStatement
    j = self.statement(argstr, i)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 721, in statement
    argstr, i, r)   # Allow literal for subject - extends RDF
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1406, in object
    j = self.subject(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 733, in subject
    return self.item(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 829, in item
    return self.path(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 837, in path
    j = self.nodeOrLiteral(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1431, in nodeOrLiteral
    j = self.node(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1049, in node
    j = self.uri_ref2(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1218, in uri_ref2
    uref = join(self._baseURI, uref)  # was: uripath.join
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 135, in join
    "colon - with relative '%s'.") % (here, there))
ValueError: Base <info:ns> has no slash after colon - with relative '/1'.

To my knowledge, info:ns/... is a valid URI. Is this something that conflicts with the N3 syntax, or something else?

Thanks.

vikash18086 commented 4 years ago

I am trying to read a TTL stream and parse into a graph by replacing a public URI (http://ex.org/res/) with an internal URI (info:ns/). I want to do this using the tools provided by RDFLib rather than writing tedious and error-prone pattern matching code.

The following is a minimal example:

>>> from rdflib import URIRef, Graph
INFO:rdflib:RDFLib Version: 4.2.2
>>> g1 = Graph().parse(data='<http://ex.org/res/1> a <http://auth.edu/ns#Resource> .', format='turtle')
>>> ttl1 = g1.serialize(format='turtle', base='http://ex.org/res')
>>> print(ttl1.decode())
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

</1> a <http://auth.edu/ns#Resource> .

>>> g2 = Graph().parse(data=ttl1, format='turtle', publicID=URIRef('info:ns'))

The last line triggers the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/graph.py", line 1043, in parse
    parser.parse(source, self, **args)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1870, in parse
    p.loadStream(source.getByteStream())
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 434, in loadStream
    return self.loadBuf(stream.read())    # Not ideal
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 440, in loadBuf
    self.feed(buf)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 466, in feed
    i = self.directiveOrStatement(s, j)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 487, in directiveOrStatement
    j = self.statement(argstr, i)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 721, in statement
    argstr, i, r)   # Allow literal for subject - extends RDF
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1406, in object
    j = self.subject(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 733, in subject
    return self.item(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 829, in item
    return self.path(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 837, in path
    j = self.nodeOrLiteral(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1431, in nodeOrLiteral
    j = self.node(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1049, in node
    j = self.uri_ref2(argstr, i, res)
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 1218, in uri_ref2
    uref = join(self._baseURI, uref)  # was: uripath.join
  File "/home/scossu/code/lakesuperior/virtualenv/lib/python3.5/site-packages/rdflib/plugins/parsers/notation3.py", line 135, in join
    "colon - with relative '%s'.") % (here, there))
ValueError: Base <info:ns> has no slash after colon - with relative '/1'.

To my knowledge, info:ns/... is a valid URI. Is this something that conflicts with the N3 syntax, or something else?

Thanks.

This issue is not valid perfectly. As the public id part is coded in such a way. That you can merge the absolute uri of public id with the relative uri of the base of ttl. But here both are absolute so public id uri is overwritten by ttl base uri. If you want to make public id as base then follow the comment given in notation3.py at line 623 regarding the base. Though we could not able to exactly replicate the error as given but we found that it is not getting updated. you can take a pull from here https://github.com/RDFLib/rdflib/pull/1104

ghost commented 2 years ago

No longer causes an Exception.

def test_issue816_issue_parsing_turtle_with_iri():
    g1 = Graph().parse(
        data="<http://ex.org/res/1> a <http://auth.edu/ns#Resource> .", format="turtle"
    )
    ttl1 = g1.serialize(format="turtle", base="http://ex.org/res")
    print(f"{ttl1}")
    # @base <http://ex.org/res> .
    # 
    # </1> a <http://auth.edu/ns#Resource> .
    g2 = Graph().parse(data=ttl1, format="turtle", publicID=URIRef("info:ns"))
    print(f"{g2.serialize(format='ttl')}")
   # <http://ex.org/1> a <http://auth.edu/ns#Resource> .