RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
283 stars 71 forks source link

Infinite recursion bug triggered by Context URN namespace #75

Closed ajnelson-nist closed 4 years ago

ajnelson-nist commented 4 years ago

(Disclaimer: Any mention of a vendor or product is not an endorsement or recommendation.)

The JSON-LD parser has a built-in expectation of the structure of the HTTP prefix. The URN prefix structure, e.g. urn:example: (RFC 6963), is missing characters that are expected in a rdflib JSON-LD function. This causes a string passed to a recursive call to not reduce in length.

The specific function is Context._prep_expand, called by Context._rec_expand. _prep_expand returns its input to pfx without a reduction in length, causing an infinite recursion.

This was tested against rdflib version 4.2.2, and rdflib_jsonld version 0.4.0, both retrieved with pip install rdflib-jsonld minutes ago.

Steps to reproduce:

#!/bin/bash

# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

set -e

cat >sample_http.json <<EOF
{
    "@context": {
        "@vocab": "http://example.org/1/",
        "": "http://example.org/2/"
    },
    "@id": ":nsvocab",
    "@type": "nsthing",
    "foo": ":bar"
}
EOF

cat >sample_urn.json <<EOF
{
    "@context": {
        "@vocab": "urn:example:1:",
        "": "urn:example:2:"
    },
    "@id": ":nsvocab",
    "@type": "nsthing",
    "foo": ":bar"
}
EOF

cat >sample_urn.ttl <<EOF
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .

<urn:example:2:nsvocab>
    a <urn:example:1:nsthing> ;
    <urn:example:1:foo> ":bar" ;
    .

EOF

cat >sample_urn_without_context.json <<EOF
[
    {
        "@id" : "urn:example:2:nsvocab",
        "@type" : "urn:example:1:nsblank",
        "urn:example:1:foo" : ":bar",
        "@context" : {
            "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        }
    }
]
EOF

cat >load_graph.py <<EOF
import sys
import rdflib
g = rdflib.Graph()
g.parse(sys.argv[1], format=sys.argv[2])
EOF

#PASS
python3 load_graph.py sample_http.json json-ld

#PASS
python3 load_graph.py sample_urn.ttl turtle

#PASS
python3 load_graph.py sample_urn_without_context.json json-ld

#FAIL
python3 load_graph.py sample_urn.json json-ld

The above script creates four data files. The first and second, sample_http.json and sample_urn.json, are expected to behave similarly, but sample_urn.json triggers the infinite-recursion bug. The third and fourth are the output of converting sample_urn.json with rdf-toolkit. That conversion at least demonstrates that there is at least one framework that recognizes how to work with URN-based namespaces in the @context dictionary. The commands to reproduce are:

java -jar rdf-toolkit.jar \
  --infer-base-iri \
  --inline-blank-nodes \
  --source sample_urn.json \
  --source-format json-ld \
  --target sample_urn.ttl \
  --target-format turtle

java -jar rdf-toolkit.jar \
  --infer-base-iri \
  --inline-blank-nodes \
  --source sample_urn.json \
  --source-format json-ld \
  --target sample_urn_without_context.json \
  --target-format json-ld

The fourth file, sample_urn_without_context.json, demonstrates that the URN problem in rdflib_jsonld is only when dealing with the @context dictionary, in case this provides a helpful hint on parsing code that can be borrowed and/or abstracted.

(EDIT: Fixed a copy-paste error in the third Python call.)

niklasl commented 4 years ago

Thanks for the report! This should be fixed by 7e1cc371fabcebcbb1778adad77ff1e3d82db29f.

ajnelson-nist commented 4 years ago

Thank you for the fast turnaround!