RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.11k stars 547 forks source link

Unexpected recursive context inclusion exception #2778

Open jmfernandez opened 2 months ago

jmfernandez commented 2 months ago

Hi, I'm involved into https://github.com/ResearchObject community, and I'm writing some code to parse RO-Crate JSON-LD representation, in order to perform further processing. Meanwhile I was doing some tests using RDFLib 7.0.0, I guess I have uncovered a corner case bug in its embedded JSON-LD processor plugin, and I have been able to narrow the test code and contents which fire it.

Using next code:

#!/usr/bin/env python3

import json
import rdflib
import sys

for filename in sys.argv[1:]:
    with open(filename, mode="r", encoding="utf-8") as IJD:
        print(f"Loading {filename}")
        input_jld = json.load(IJD)

        g = rdflib.Graph()
        parsed = g.parse(data=json.dumps(input_jld), format="json-ld")

works as expected with next attached toy files:

But it fails with next one:

raising

Loading fails1.jsonld
Traceback (most recent call last):
  File "/home/jmfernandez/projects/rdflib/load_test.py", line 13, in <module>
    parsed = g.parse(data=json.dumps(input_jld), format="json-ld")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/graph.py", line 1492, in parse
    parser.parse(source, self, **args)
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/parsers/jsonld.py", line 119, in parse
    to_rdf(data, conj_sink, base, context_data, version, generalized_rdf)
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/parsers/jsonld.py", line 138, in to_rdf
    return parser.parse(data, context, dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/parsers/jsonld.py", line 160, in parse
    context.load(local_context, context.base)
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/shared/jsonld/context.py", line 401, in load
    self._prep_sources(base, source, sources, referenced_contexts)
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/shared/jsonld/context.py", line 450, in _prep_sources
    self._prep_sources(
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/shared/jsonld/context.py", line 430, in _prep_sources
    new_ctx = self._fetch_context(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/jmfernandez/projects/rdflib/.bug/lib/python3.11/site-packages/rdflib/plugins/shared/jsonld/context.py", line 463, in _fetch_context
    raise RECURSIVE_CONTEXT_INCLUSION
rdflib.plugins.shared.jsonld.errors.JSONLDException: recursive context inclusion

Surprisingly, next ones work (at the beginning I thought it was an issue with the current context):

wallberg commented 1 month ago

The problem is that fails1.jsonld.json references both https://w3id.org/ro/crate/1.1/context and https://w3id.org/ro/terms/workflow-run, but https://w3id.org/ro/terms/workflow-run itself also references https://w3id.org/ro/crate/1.1/context.

$ curl -s -D - -L --header "Accept: application/ld+json, application/json;q=0.9, */*;q=0.1" https://w3id.org/ro/terms/workflow-run
HTTP/1.1 303 See Other
Date: Fri, 10 May 2024 13:33:26 GMT
Server: Apache/2.4.29 (Ubuntu)
Access-Control-Allow-Origin: *
Location: https://www.researchobject.org/ro-terms/workflow-run/context.json
Content-Length: 347
Content-Type: text/html; charset=iso-8859-1

HTTP/2 200
server: GitHub.com
content-type: application/json; charset=utf-8
last-modified: Tue, 07 May 2024 10:47:00 GMT
access-control-allow-origin: *
etag: "663a06a4-4fa"
expires: Fri, 10 May 2024 12:53:33 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 66BE:12EDA7:4B3666:5B7E8D:663E1674
accept-ranges: bytes
age: 102
date: Fri, 10 May 2024 13:33:26 GMT
via: 1.1 varnish
x-served-by: cache-ewr18145-EWR
x-cache: HIT
x-cache-hits: 0
x-timer: S1715348006.225059,VS0,VE2
vary: Accept-Encoding
x-fastly-request-id: 1182bf956578eccfb9b7d97fcb3c094c8b2465d6
content-length: 1274

{
    "@context": [
        "https://w3id.org/ro/crate/1.1/context",
        {
            "ParameterConnection": "https://w3id.org/ro/terms/workflow-run#ParameterConnection",
            "ContainerImage": "https://w3id.org/ro/terms/workflow-run#ContainerImage",
            "DockerImage": "https://w3id.org/ro/terms/workflow-run#DockerImage",
            "SIFImage": "https://w3id.org/ro/terms/workflow-run#SIFImage",
            "connection": "https://w3id.org/ro/terms/workflow-run#connection",
            "sourceParameter": "https://w3id.org/ro/terms/workflow-run#sourceParameter",
            "targetParameter": "https://w3id.org/ro/terms/workflow-run#targetParameter",
            "md5": "https://w3id.org/ro/terms/workflow-run#md5",
            "sha1": "https://w3id.org/ro/terms/workflow-run#sha1",
            "sha256": "https://w3id.org/ro/terms/workflow-run#sha256",
            "sha512": "https://w3id.org/ro/terms/workflow-run#sha512",
            "environment": "https://w3id.org/ro/terms/workflow-run#environment",
            "registry": "https://w3id.org/ro/terms/workflow-run#registry",
            "tag": "https://w3id.org/ro/terms/workflow-run#tag",
            "containerImage": "https://w3id.org/ro/terms/workflow-run#containerImage"
        }
    ]
}
wallberg commented 1 month ago

What I wonder is why encountering the same context a second time needs to raise an exception at all? Why can't we simply skip it?

I tried replacing the exception with a skip, return None, at https://github.com/RDFLib/rdflib/blob/main/rdflib/plugins/shared/jsonld/context.py#L474-L475, and all tests pass other than the one expecting the exception.