RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 554 forks source link

JSON-LD schema.org parsing fails with JSONDecodeError("Expecting value", s, err.value) from None #1781

Closed danbri closed 2 years ago

danbri commented 2 years ago

I have been trying different variations to parse a JSON-LD file into a Graph, but they're all failing.

The file seems OK (I tried several) and it parses ok with the JSON-LD playground. I tried a few variations for invoking the parser.

This was after entirely nuking and reinstalling Python/Anaconda, and was in a fresh Conda environment (python=3.8), and with only "pip3 install rdflib", i.e. no ageing version of the plugin version of the parser hanging around.

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
if __name__ == '__main__':
    fn = "example1.jsonld"
    g = Graph()
    g.parse(fn, format="json-ld")

parsejsonld_B.py

#!/usr/bin/env python3

from rdflib import Graph
g = Graph().parse("example1.jsonld", format="json-ld")
g.serialize("test-jsonld.nt", format="nt")

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
g = Graph()
g.parse(location = "file:feedkgx/example1.jsonld")
print(len(g))

The example file is just taken from Google documentation, see this Gist.

In each case I get this response:

./parsejsonld_A.py

Traceback (most recent call last):
  File "./parsejsonld_A.py", line 8, in <module>
    g.parse(fn, format="json-ld")
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/graph.py", line 1258, in parse
    parser.parse(source, self, **args)  # type: ignore[call-arg]
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 125, in parse
    to_rdf(data, conj_sink, base, context_data, version, generalized_rdf)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 144, in to_rdf
    return parser.parse(data, context, dataset)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 164, in parse
    context.load(local_context, context.base)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 357, in load
    self._prep_sources(base, source, sources, referenced_contexts)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 381, in _prep_sources
    new_ctx = self._fetch_context(
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 413, in _fetch_context
    source = source_to_json(source_url)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/util.py", line 43, in source_to_json
    return json.load(use_stream)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

As far as I can tell this is this something to do with the Schema.org @context URL, and our migration from http://schema.org/ + conneg, to https://schema.org/ and a JSON-LD 1.1-style HTTP header as the discovery mechanism for the context? But the error message is pretty uninformative.

If I change the schema.org context in the files to avoid a remote context, it parses.

The context lives here:

curl -s --head https://schema.org/ | grep 'link:' link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

Would a PR be welcomed on this?

e.g.

Related discussion: https://github.com/schemaorg/schemaorg/issues/2578

ghost commented 2 years ago

Looks like RDFLib users have encountered this issue before, according to https://github.com/RDFLib/rdflib/issues/1423#issuecomment-939989021 and a fix was committed just a couple of weeks ago in https://github.com/RDFLib/rdflib/pull/1436 which should actually fetch the context doc via the link.

I just checked this using the latest master branch and got:

def test_jsonld_conneg():
    g = Graph().parse(location="https://gist.githubusercontent.com/danbri/0cc3fc147d6d34945d0f61dcc11bc409/raw/0aa0d1a7574495a8fe7f1297121afe921b048a8f/gistfile1.txt", format="json-ld")
    assert len(g) == 35

So, if that's actually testing your issue (not necessarily the case, given the conneg implications) then please check with the current master branch (all tests passing as of 13 hrs ago at time of response).

RichardWallis commented 2 years ago

Locally checked fix to identified problem in latest master branch and all seems OK.

Presume this will be in 6.2.x when it is released.

aucampia commented 2 years ago

Closing this as 6.2.0 has been released, please re-open if the issue persists.