RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
282 stars 71 forks source link

parsing error with "@id" property #77

Closed James-Hudson3010 closed 4 years ago

James-Hudson3010 commented 4 years ago

I have the following:

import rdflib

jsonRepresentation = """
{
    "@id": "http://mne.org/#1.0",

    "@graph": [
        {
            "@id": "mne:aaa",
            "@type": "rdfs:Class",
            "rdfs:comment": "An aaa",
            "rdfs:label": "aaa"
        }
    ]
}
"""

g = rdflib.Graph().parse( data = jsonRepresentation, format = 'json-ld' )

for subject,predicate,obj in g:
    if not (subject,predicate,obj) in g:
        raise Exception("Iterator / Container Protocols are Broken!!")
    else:

        print( f"SUBJECT:   {subject}" )
        print( f"PREDICATE: {predicate}" )
        print( f"OBJECT:    {obj}" )
        print( "*****" )

Nothing is printed unless I remove "@id": "http://mne.org/#1.0"

Entering this json-ld data into https://json-ld.org/playground/ produces no errors and works as expected.

nicholascar commented 4 years ago

What you're doing here is trying to shove a Named Graph (i.e. triples with a context URI) into a single graph in rdflib.

If you remove the graph information it works, i.e.:

jsonRepresentation = """
{
    "@id": "mne:aaa",
    "@type": "rdfs:Class",
    "rdfs:comment": "An aaa",
    "rdfs:label": "aaa"
}
"""

If you really want the graph with a context, you'll need to do this:

from rdflib import URIRef
g = rdflib.Graph(identifier=URIRef("http://mne.org/#1.0")).parse(data=jsonRepresentation, format='json-ld')

So it's not really an error of rdflib or rdflib-jsonld more a limitation of what you can import.

You would have the same issue trying to import an N-Quads file into an rdflib Graph().

ashleysommer commented 4 years ago

@nicholascar Note, you can import a json-ld file with one or more named graphs using Dataset.

Like this:

from rdflib import Dataset

myds = Dataset()
myds.parse(jsonldcontents)

But the test above would still fail because interating a dataset by default iterates the "Default Graph" which in this case is empty. You need to iterate each named context in the Dataset, then iterate triples from each Dataset.

James-Hudson3010 commented 4 years ago

I am not sure I understand. A slightly more complicated jsonRepresentation would look like:

jsonRepresentation = """
{
    "@id": "http://mne.org/#1.0",

    "@graph": [
        {
            "@id": "mne:aaa",
            "@type": "rdfs:Class",
            "rdfs:comment": "An aaa",
            "rdfs:label": "aaa"
        },
        {
            "@id": "mne:bbb",
            "@type": "rdfs:Class",
            "rdfs:comment": "An bbb",
            "rdfs:label": "bbb"
        }
    ]
}
"""

Again, this can be processed by https://json-ld.org/playground/ without error, but not rdflib-jsonld.

Why can the playground handle it? Should the playground produce an error?

I believe mne:aaa and mne:bbb must be inside of a "@graph" for it to be valid json-ld.

But, I can confirm that

g = rdflib.Graph( identifier = URIRef( "http://mne.org/#1.0" ) ).parse( data = jsonRepresentation, format = 'json-ld' )

does resolve the problem, but I would have preferred to have the identifier of the graph stored in the json-ld file itself and not outside of it.

I am not sure why I cannot provide an "@id" for the "@graph" in the json-ld. I am sure I am still missing something obvious.

James-Hudson3010 commented 4 years ago

I can also see

https://w3c.github.io/json-ld-syntax/#example-115-identifying-and-making-statements-about-a-graph

which does the same thing I am doing and providing an "@id" for a "@graph".

The playground handles this example without error, but rdflib-jsonld does not appear to parse it correctly.

hsolbrig commented 4 years ago

Try:

g.parse(data=jsonRepresentation, format="json-ld")
g.serialize(format="nquads").decode()

It seems to provide what was asked?

James-Hudson3010 commented 4 years ago

Try:

g.parse(data=jsonRepresentation, format="json-ld")
g.serialize(format="nquads").decode()

It seems to provide what was asked?

When I try that I get an error:

Exception: NQuads serialization only makes sense for context-aware stores!
nicholascar commented 4 years ago

Exception: NQuads serialization only makes sense for context-aware stores!

Yes, back to the first point: rdflib Graph isn't context aware in that it only knows about &<Subject, Predicate, Object> (triples), not <Subject, Predicate, Object, Context> (quads) so shoving <S P O C> into <S P O> doesn't work!

If you do want to auto-parse the JSON-LD, as opposed to using custom logic of your own code to loop through individual parts, you are going to have to use an rdflib ConjunctiveGraph or a Dataset.

Why can the playground handle it?

Because it's just taking the JSON-LD and expecting to handle quads, not the more limited triples.

I believe mne:aaa and mne:bbb must be inside of a "@graph" for it to be valid json-ld.

In your example they are! You've declared two sets of triples in the single graph. In N-Quads (with some hand-made shorter URLs for readability), this is:

<mne:aaa> <rdf:type> <rdfs:Class> <mne:1.0> .
<mne:aaa> <rdfs:comment> "An aaa" <mne:1.0> .
<mne:aaa> <rdfs:label> "aaa" <mne:1.0> .
<mne:bbb> <rdf:type> <rdfs:Class> <mne:1.0> .
<mne:bbb> <rdfs:comment> "An bbb" <mne:1.0> .
<mne:bbb> <rdfs:label> "bbb" <mne:1.0> .

So two subjects, <mne:aaa> & <mne:bbb>, two triples each, single Named Graph (Context) <mne:1.0>.

hsolbrig commented 4 years ago

Fascinating: Here is my code verbatim. rdflib 4.2.2 -

from rdflib import Dataset
g = Dataset()
jsonRepresentation = """
{
    "@id": "http://mne.org/#1.0",

    "@graph": [
        {
            "@id": "mne:aaa",
            "@type": "rdfs:Class",
            "rdfs:comment": "An aaa",
            "rdfs:label": "aaa"
        },
        {
            "@id": "mne:bbb",
            "@type": "rdfs:Class",
            "rdfs:comment": "An bbb",
            "rdfs:label": "bbb"
        }
    ]
}
"""
g.parse(data=jsonRepresentation, format="json-ld")
print(g.serialize(format="nquads").decode())

output:

<mne:bbb> <rdfs:comment> "An bbb" <http://mne.org/#1.0> .
<mne:aaa> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <rdfs:Class> <http://mne.org/#1.0> .
<mne:bbb> <rdfs:label> "bbb" <http://mne.org/#1.0> .
<mne:aaa> <rdfs:comment> "An aaa" <http://mne.org/#1.0> .
<mne:bbb> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <rdfs:Class> <http://mne.org/#1.0> .
<mne:aaa> <rdfs:label> "aaa" <http://mne.org/#1.0> .
James-Hudson3010 commented 4 years ago

This has been cleared up for me.

James-Hudson3010 commented 4 years ago

As this question and answer is probably also useful in a more public forum and not an actual bug, I have created a stackoverflow question. I will answer it in 24 hours when that is allowed, or someone here can with the information provided in this issue.

https://stackoverflow.com/questions/61000017/rdflib-jsonld-parsing-error-with-id-property