RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
280 stars 71 forks source link

Cannot parse JSON-LD document if the scheme of @base IRI is non-standard #97

Open anatoly-scherbakov opened 3 years ago

anatoly-scherbakov commented 3 years ago

Problem

Full code of the example is here: https://gist.github.com/anatoly-scherbakov/9410aba3af518e1a3301b32b693f2579

I am trying to import a JSON-LD document into an RDFLib in-memory graph instance. Versions of the software:

rdflib==5.0.0
rdflib-jsonld==0.5.0
PyLD==2.0.3

The document I am working with contains a @base IRI in its @context.

Expected result

I expect the import to work correctly if the @base value is a correct IRI regardless of its protocol. But it seems that the import works with these:

        'http://robotics.example.com/robots/',
        'https://robotics.example.com/robots/',
        'ftp://robotics.example.com/robots/',
        'file://robotics.example.com/robots/',

but does not work with these:

        'ipns://robotics.example.com/robots/',
        'tftp://robotics.example.com/robots/',
        'ntp://robotics.example.com/robots/',
        'local://robotics.example.com/robots/',

In the latter case, I just get an empty graph.

I tried to find a hardcoded list of allowed schemas in rdflib, rdflib-jsonld, and pyld repositories, but did not succeed. Maybe you could point me to the right direction? Thank you!

craig-willis commented 3 years ago

Just ran into this issue trying to parse an example from https://www.researchobject.org/ro-crate/1.1/appendix/relative-uris.html#establishing-a-base-uri-inside-a-zip-file.

It appears to be caused by use of urllib.parse.urljoin which only supports a specific set of schemes. There is a documented workaround (https://bugs.python.org/issue18828#msg196794):

import urllib.parse
urllib.parse.uses_relative.append('<scheme>')
urllib.parse.uses_netloc.append('<scheme>')