Open anatoly-scherbakov opened 3 years ago
@anatoly-scherbakov I'm not sure whey the check is performed - it might be a misunderstanding about what valid IRIs are - and there's probably no way to find out since whoever added that check is likely not still actively involved in RDFlib.
Please go ahead and submit a PR!
The raising of an error rather than the Exception is due to the change introduced in @rchateauneu's 28th Feb Speedup commit. Prior to that, the code had remained unchanged since eikeon committed it 12 years ago;
- if here[bcolonl + 1 : bcolonl + 2] != "/":
+ if here[bcolonl + 1] != "/":
Reverting the change avoids the error and causes the intended Exception to be raised: ValueError: Base <local:> has no slash after colon - with relative 'class_to_class'.
(I must admit I'm mystified why the change causes the error)
The check is preceded by the comment: # join('mid:foo@example', '../foo') bzzt
--- the check and the Exception are explicitly tested in the doctests which militates against removing it. Worth noting that the join
docstring includes the caveat “haven't checked the details of the IRI spec though”.
Well done ! "Mystified why the change caused the error" I am too...
Python slicing subtlety, TIL that this doesn't raise an Exception:
def test_baz0():
here = "local:"
blocal = len(here)
x = here[blocal + 20000 : blocal + 300000]
assert x == ""
I'm actually not entirely clear if this issue is fixed or not, the PR only changed the exception, the actual example from @anatoly-scherbakov still fails though and I'm not sure if it should or should not.
Example file:
@base <example:> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<class_to_class>
a
rdfs:Class ,
<Category> ;
<color> "blue" ;
<priority> 4 .
$ riot --out=nt test/variants/base_without_slash.n3
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
$ pipx run --spec rdflib==6.1.1 rdfpipe -i n3 -o nt test/variants/base_without_slash.n3
⚠️ rdfpipe is already on your PATH and installed at /home/iwana/.local/bin/rdfpipe. Downloading and running anyway.
Traceback (most recent call last):
File "/home/iwana/.local/pipx/.cache/18099159648349d/bin/rdfpipe", line 8, in <module>
sys.exit(main())
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 200, in main
parse_and_serialize(
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 54, in parse_and_serialize
graph.parse(fpath, format=use_format, **kws)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/graph.py", line 1851, in parse
context.parse(source, publicID=publicID, format=format, **args)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/graph.py", line 1258, in parse
parser.parse(source, self, **args) # type: ignore[call-arg]
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1947, in parse
TurtleParser.parse(self, source, conj_graph, encoding, turtle=False)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1913, in parse
p.loadStream(stream)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 434, in loadStream
return self.loadBuf(stream.read()) # Not ideal
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 440, in loadBuf
self.feed(buf)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 466, in feed
i = self.directiveOrStatement(s, j)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 486, in directiveOrStatement
j = self.statement(argstr, i)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 729, in statement
i = self.object(argstr, i, r) # Allow literal for subject - extends RDF
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1411, in object
j = self.subject(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 740, in subject
return self.item(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 832, in item
return self.path(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 839, in path
j = self.nodeOrLiteral(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1439, in nodeOrLiteral
j = self.node(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1043, in node
j = self.uri_ref2(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1203, in uri_ref2
uref = join(self._baseURI, uref) # was: uripath.join
File "/home/iwana/.local/pipx/.cache/18099159648349d/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 136, in join
if here[bcolonl + 1] != "/":
IndexError: string index out of range
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i n3 -o nt test/variants/base_without_slash.n3
⚠️ rdfpipe is already on your PATH and installed at /home/iwana/.local/bin/rdfpipe. Downloading and running anyway.
Traceback (most recent call last):
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/bin/rdfpipe", line 8, in <module>
sys.exit(main())
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 200, in main
parse_and_serialize(
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/tools/rdfpipe.py", line 54, in parse_and_serialize
graph.parse(fpath, format=use_format, **kws)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/graph.py", line 1812, in parse
context.parse(source, publicID=publicID, format=format, **args)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/graph.py", line 1226, in parse
parser.parse(source, self, **args) # type: ignore[call-arg]
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1979, in parse
TurtleParser.parse(self, source, conj_graph, encoding, turtle=False)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1945, in parse
p.loadStream(stream)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 456, in loadStream
return self.loadBuf(stream.read()) # Not ideal
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 462, in loadBuf
self.feed(buf)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 488, in feed
i = self.directiveOrStatement(s, j)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 508, in directiveOrStatement
j = self.statement(argstr, i)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 751, in statement
i = self.object(argstr, i, r) # Allow literal for subject - extends RDF
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1433, in object
j = self.subject(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 762, in subject
return self.item(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 854, in item
return self.path(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 861, in path
j = self.nodeOrLiteral(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1461, in nodeOrLiteral
j = self.node(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1065, in node
j = self.uri_ref2(argstr, i, res)
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 1225, in uri_ref2
uref = join(self._baseURI, uref) # was: uripath.join
File "/home/iwana/.local/pipx/.cache/1500787fc0bbcf9/lib64/python3.10/site-packages/rdflib/plugins/parsers/notation3.py", line 155, in join
raise ValueError(
ValueError: Base <example:> has no slash after colon - with relative 'class_to_class'.
rapper is also fine with similar URIs, tested with turtle:
@base <example:> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<class_to_class> a <Category>,
rdfs:Class ;
<color> "blue" ;
<priority> 4 .
$ rapper -o ntriples -i turtle test/variants/base_without_slash.ttl
rapper: Parsing URI file:///home/iwana/sw/d/github.com/iafork/rdflib/test/variants/base_without_slash.ttl with parser turtle
rapper: Serializing with serializer ntriples
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
rapper: Parsing returned 4 triples
$ riot --check --strict --out=nt test/variants/base_without_slash.ttl
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:Category> .
<example:class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:class_to_class> <example:color> "blue" .
<example:class_to_class> <example:priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
$ pipx run --spec git+https://github.com/RDFLib/rdflib.git@master#egg=rdflib rdfpipe -i n3 -o nt test/variants/base_without_slash.ttl
...
ValueError: Base <example:> has no slash after colon - with relative 'class_to_class'.
RDF4J is also more or less fine with it, though it interprets it a bit differently, and maybe more correctly:
$ ./console.sh
23:50:58.838 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux
23:50:58.841 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform
Connected to default data directory
RDF4J Console 3.7.4
Working dir: /home/iwana/.local/opt/eclipse-rdf4j/bin
Type 'help' for help.
> create native
Please specify values for the following variables:
Repository ID [native]:
Repository title [Native store]:
Query Iteration Cache size [10000]:
Triple indexes [spoc,posc]:
EvaluationStrategyFactory [org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory]:
WARNING: you are about to overwrite the configuration of an existing repository!
Proceed? (yes|no) [no]: yes
Repository created
> open native
Opened repository 'native'
native> load /home/iwana/sw/d/github.com/iafork/rdflib/test/variants/base_without_slash.ttl
Loading data...
Data has been added to the repository (43 ms)
native> export /var/tmp/exported.nt
Exporting data...
Data has been written to file (17 ms)
native> exit
20220108T235131 iwana@iwana-pc00.coop.no:~/.local/opt/eclipse-rdf4j/bin
$ cat /var/tmp/exported.nt
<example:/class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <example:/Category> .
<example:/class_to_class> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
<example:/class_to_class> <example:/color> "blue" .
<example:/class_to_class> <example:/priority> "4"^^<http://www.w3.org/2001/XMLSchema#integer> .
From RDF 1.1 Turtle / 6.3 IRI References and IETF RFC 3986: Uniform Resource Identifier (URI): Generic Syntax / 5.2. Relative Resolution I'm pretty sure it should be valid, at least if it is only a scheme, which basically it is in the example @anatoly-scherbakov gave.
I'm re-opening this, I may of course be wrong, and this should be invalid, if that is the case please do share some details as to why.
the rfc3986 python package seems to agree that it is invalid (strict=False
has same behaviour):
$ pipx run --spec rfc3986==1.5.0 python -c 'from rfc3986 import uri_reference; print(uri_reference("john.smith").resolve_with("example:", strict=True))'
⚠️ python is already on your PATH and installed at /usr/bin/python. Downloading and running anyway.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/iwana/.local/pipx/.cache/ccafdebe12fa395/lib64/python3.10/site-packages/rfc3986/_mixin.py", line 266, in resolve_with
raise exc.ResolutionError(base_uri)
rfc3986.exceptions.ResolutionError: example: is not an absolute URI.
I made an issue against the rfc3096 python package now, and also gave an explanation there of why resolving a relative reference like john.smith
against a base like example:
should yield example:/john.smith
:
This is likely not the highest priority though, as this is probably best avoided and there are few cases I can see where doing it will be needed.
Okay the issue in https://pypi.org/project/rfc3986/ is fixed, and now they handle it the same as RDF4J, which is the correct way IMO.
I am using RDFLib 6.2.0 and this is still an issue for me. For clarity, it is my understanding that IRIs specify a scheme for converting unicode to ascii to "internationalize" the URI scheme for non-ASCII characters. The URI syntax (RFC 3986) is really straightforward if you just specify non-reserved ASCII characters after the scheme, but has a complex hierarchy system, where reserved ASCII characters are specified on page 12 and used as specific delimiters: e.g., //
specifies an authority, /
?
#
separate hierarchical parts, and %
is used to encode octets. There are several examples in RFC 3986 (on page 6) that do not use /
. For example, mailto:
, news:
, and urn:
.
I am trying to use URIs for books as follows: urn:isbn:9791280035356
These are unique and require no /
The issue is illustrated by this gist:
https://gist.github.com/anatoly-scherbakov/9fafb2863b877991f56ac7766b7c1bf0
@base <local:> .
in an RDF/N3 document, which should convert an RDF term<Category>
to<local:Category>
.<local:/>
instead as a@base
— everything works.But I was trying to get
<local:>
working, and also, I believe,<local:Category>
is a perfectly good IRI. Real world examples of such schemas may bedoi
andmailto
.This might be related to #816 but I am not certain of that.
rdflib version is 5.0.0. The exception is raised here:
https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/parsers/notation3.py#L139-L144
I would be happy to create a PR removing this check, but I would like first to understand why the check is implemented.