RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Parsing N3 with a @base IRI which does not include a slash #1216

Open anatoly-scherbakov opened 3 years ago

anatoly-scherbakov commented 3 years ago

The issue is illustrated by this gist:

https://gist.github.com/anatoly-scherbakov/9fafb2863b877991f56ac7766b7c1bf0

But I was trying to get <local:> working, and also, I believe, <local:Category> is a perfectly good IRI. Real world examples of such schemas may be doi and mailto.

This might be related to #816 but I am not certain of that.

rdflib version is 5.0.0. The exception is raised here:

https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/notation3.py#L139-L144

I would be happy to create a PR removing this check, but I would like first to understand why the check is implemented.

nicholascar commented 3 years ago

@anatoly-scherbakov I'm not sure whey the check is performed - it might be a misunderstanding about what valid IRIs are - and there's probably no way to find out since whoever added that check is likely not still actively involved in RDFlib.

Please go ahead and submit a PR!

ghost commented 2 years ago

The raising of an error rather than the Exception is due to the change introduced in @rchateauneu's 28th Feb Speedup commit. Prior to that, the code had remained unchanged since eikeon committed it 12 years ago;

-    if here[bcolonl + 1 : bcolonl + 2] != "/":
+    if here[bcolonl + 1] != "/":

Reverting the change avoids the error and causes the intended Exception to be raised: ValueError: Base <local:> has no slash after colon - with relative 'class_to_class'. (I must admit I'm mystified why the change causes the error)

The check is preceded by the comment: # join('mid:foo@example', '../foo') bzzt --- the check and the Exception are explicitly tested in the doctests which militates against removing it. Worth noting that the join docstring includes the caveat “haven't checked the details of the IRI spec though”.

rchateauneu commented 2 years ago

Well done ! "Mystified why the change caused the error" I am too...

ghost commented 2 years ago

Python slicing subtlety, TIL that this doesn't raise an Exception:

def test_baz0():
    here = "local:"
    blocal = len(here)
    x =  here[blocal + 20000 : blocal + 300000]
    assert x == ""
aucampia commented 2 years ago

I'm actually not entirely clear if this issue is fixed or not, the PR only changed the exception, the actual example from @anatoly-scherbakov still fails though and I'm not sure if it should or should not.

Example file:

@base <example:> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<class_to_class>
  a
    rdfs:Class ,
    <Category> ;
  <color> "blue" ;
  <priority> 4 .

Tested with riot

$ riot --out=nt test/variants/base_without_slash.n3 
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .

Tested against 6.1.1

$ pipx run --spec rdflib==6.1.1 rdfpipe -i n3 -o nt test/variants/base_without_slash.n3 
⚠️  rdfpipe is already on your PATH and installed at /home/iwana/.local/bin/rdfpipe. Downloading and running anyway.
Traceback (most recent call last):
  File "/home/iwana/.local/pipx/.cache/18099159648349d/bin/rdfpipe", line 8, in <module>
    sys.exit(main())
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 200, in main
    parse_and_serialize(
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 54, in parse_and_serialize
    graph.parse(fpath, format=use_format, **kws)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/graph.py", line 1851, in parse
    context.parse(source, publicID=publicID, format=format, **args)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/graph.py", line 1258, in parse
    parser.parse(source, self, **args)  # type: ignore[call-arg]
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1947, in parse
    TurtleParser.parse(self, source, conj_graph, encoding, turtle=False)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1913, in parse
    p.loadStream(stream)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 434, in loadStream
    return self.loadBuf(stream.read())  # Not ideal
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 440, in loadBuf
    self.feed(buf)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 466, in feed
    i = self.directiveOrStatement(s, j)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 486, in directiveOrStatement
    j = self.statement(argstr, i)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 729, in statement
    i = self.object(argstr, i, r)  # Allow literal for subject - extends RDF
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1411, in object
    j = self.subject(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 740, in subject
    return self.item(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 832, in item
    return self.path(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 839, in path
    j = self.nodeOrLiteral(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1439, in nodeOrLiteral
    j = self.node(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1043, in node
    j = self.uri_ref2(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1203, in uri_ref2
    uref = join(self._baseURI, uref)  # was: uripath.join
  File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 136, in join
    if here[bcolonl + 1] != "/":
IndexError: string index out of range

Tested against master

$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i n3 -o nt test/variants/base_without_slash.n3
⚠️  rdfpipe is already on your PATH and installed at /home/iwana/.local/bin/rdfpipe. Downloading and running anyway.
Traceback (most recent call last):
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/bin/rdfpipe", line 8, in <module>
    sys.exit(main())
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 200, in main
    parse_and_serialize(
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 54, in parse_and_serialize
    graph.parse(fpath, format=use_format, **kws)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/graph.py", line 1812, in parse
    context.parse(source, publicID=publicID, format=format, **args)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/graph.py", line 1226, in parse
    parser.parse(source, self, **args)  # type: ignore[call-arg]
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1979, in parse
    TurtleParser.parse(self, source, conj_graph, encoding, turtle=False)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1945, in parse
    p.loadStream(stream)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 456, in loadStream
    return self.loadBuf(stream.read())  # Not ideal
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 462, in loadBuf
    self.feed(buf)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 488, in feed
    i = self.directiveOrStatement(s, j)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 508, in directiveOrStatement
    j = self.statement(argstr, i)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 751, in statement
    i = self.object(argstr, i, r)  # Allow literal for subject - extends RDF
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1433, in object
    j = self.subject(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 762, in subject
    return self.item(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 854, in item
    return self.path(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 861, in path
    j = self.nodeOrLiteral(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1461, in nodeOrLiteral
    j = self.node(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1065, in node
    j = self.uri_ref2(argstr, i, res)
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1225, in uri_ref2
    uref = join(self._baseURI, uref)  # was: uripath.join
  File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 155, in join
    raise ValueError(
ValueError: Base <example:> has no slash after colon - with relative 'class_to_class'.
aucampia commented 2 years ago

rapper is also fine with similar URIs, tested with turtle:

@base <example:> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<class_to_class> a <Category>,
        rdfs:Class ;
    <color> "blue" ;
    <priority> 4 .
$ rapper -o ntriples -i turtle test/variants/base_without_slash.ttl
rapper: Parsing URI file:///home/iwana/sw/d/github.com/iafork/rdflib/test/variants/base_without_slash.ttl with parser turtle
rapper: Serializing with serializer ntriples
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
rapper: Parsing returned 4 triples
$ riot --check --strict --out=nt test/variants/base_without_slash.ttl
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i n3 -o nt test/variants/base_without_slash.ttl
...
ValueError: Base <example:> has no slash after colon - with relative 'class_to_class'.
aucampia commented 2 years ago

RDF4J is also more or less fine with it, though it interprets it a bit differently, and maybe more correctly:

$ ./console.sh 
23:50:58.838 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux
23:50:58.841 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform
Connected to default data directory
RDF4J Console 3.7.4
Working dir: /home/iwana/.local/opt/eclipse-rdf4j/bin
Type 'help' for help.
> create native
Please specify values for the following variables:
Repository ID [native]: 
Repository title [Native store]: 
Query Iteration Cache size [10000]: 
Triple indexes [spoc,posc]: 
EvaluationStrategyFactory [org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory]: 
WARNING: you are about to overwrite the configuration of an existing repository!
Proceed? (yes|no) [no]: yes
Repository created
> open native
Opened repository 'native'
native> load /home/iwana/sw/d/github.com/iafork/rdflib/test/variants/base_without_slash.ttl
Loading data...
Data has been added to the repository (43 ms)
native> export /var/tmp/exported.nt
Exporting data...
Data has been written to file (17 ms)
native> exit
20220108T235131 iwana@iwana-pc00.coop.no:~/.local/opt/eclipse-rdf4j/bin
$ cat /var/tmp/exported.nt 
<example:/class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:/Category> .
<example:/class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:/class_to_class> <example:/color> "blue" .
<example:/class_to_class> <example:/priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
aucampia commented 2 years ago

From RDF 1.1 Turtle / 6.3 IRI References and IETF RFC 3986: Uniform Resource Identifier (URI): Generic Syntax / 5.2. Relative Resolution I'm pretty sure it should be valid, at least if it is only a scheme, which basically it is in the example @anatoly-scherbakov gave.

I'm re-opening this, I may of course be wrong, and this should be invalid, if that is the case please do share some details as to why.

aucampia commented 2 years ago

the rfc3986 python package seems to agree that it is invalid (strict=False has same behaviour):

$ pipx run --spec rfc3986==1.5.0 python -c 'from rfc3986 import uri_reference; print(uri_reference("john.smith").resolve_with("example:", strict=True))'
⚠️  python is already on your PATH and installed at /usr/bin/python. Downloading and running anyway.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/iwana/.local/pipx/.cache/ccafdebe12fa395/lib64/python3.10/site-packages/rfc3986/_mixin.py", line 266, in resolve_with
    raise exc.ResolutionError(base_uri)
rfc3986.exceptions.ResolutionError: example: is not an absolute URI.
aucampia commented 2 years ago

I made an issue against the rfc3096 python package now, and also gave an explanation there of why resolving a relative reference like john.smith against a base like example: should yield example:/john.smith:

This is likely not the highest priority though, as this is probably best avoided and there are few cases I can see where doing it will be needed.

aucampia commented 2 years ago

Okay the issue in https://pypi.org/project/rfc3986/ is fixed, and now they handle it the same as RDF4J, which is the correct way IMO.

SvenPVoigt commented 1 year ago

I am using RDFLib 6.2.0 and this is still an issue for me. For clarity, it is my understanding that IRIs specify a scheme for converting unicode to ascii to "internationalize" the URI scheme for non-ASCII characters. The URI syntax (RFC 3986) is really straightforward if you just specify non-reserved ASCII characters after the scheme, but has a complex hierarchy system, where reserved ASCII characters are specified on page 12 and used as specific delimiters: e.g., // specifies an authority, / ? # separate hierarchical parts, and % is used to encode octets. There are several examples in RFC 3986 (on page 6) that do not use /. For example, mailto:, news:, and urn:.

I am trying to use URIs for books as follows: urn:isbn:9791280035356

These are unique and require no /

mwx23 commented 1 year ago

I am also getting this error: ValueError: Base <urn:foo:> has no slash after colon - with relative 'bar' in RDFLib 6.2.0. Wikipedia collates the syntax rules from RFC 3986. <urn:foo:bar> should be a valid identifier.