RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
280 stars 71 forks source link

Valid json+ld produces an empty graph #70

Open teledyn opened 4 years ago

teledyn commented 4 years ago
from rdflib import Namespace, Graph, RDF, XSD, URIRef, plugin, Literal
from rdflib.serializer import Serializer

js = """{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://example.com/#organization","url":"https://example.com/","name":"Move Ahead","sameAs":[]},{"@type":"WebSite","@id":"https://example.com/#website","url":"https://example.com/","name":"Move Ahead","publisher":{"@id":"https://example.com/#organization"},"potentialAction":{"@type":"SearchAction","target":"https://example.com/?s={search_term_string}","query-input":"required name=search_term_string"}},{"@type":"CollectionPage","@id":"https://example.com/category/lease/#collectionpage","url":"https://example.com/category/lease/","inLanguage":"en-US","name":"Leasing","isPartOf":{"@id":"https://example.com/#website"},"description":"Reallybig Leasing Co. A leading global transportation services provider"}]}"""
g = Graph().parse(data=js,format='json-ld')
len(g)

and the length is zero.

If I use for item in json.loads(js).get('@graph'): I can build a graph, but it doesn't resolve properly, we are missing connected data and the rdf:type is missing the context:

>>> for item in json.loads(js).get('@graph'):
...     g += Graph().parse(data=json.dumps(item),format='json-ld')
... 
>>> for row in g.query("SELECT * where {?s ?p ?o}"):
...     print(row)
... 
(rdflib.term.URIRef(u'https://example.com/#website'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/WebSite'))
(rdflib.term.URIRef(u'https://example.com/#organization'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/Organization'))
(rdflib.term.URIRef(u'https://example.com/category/lease/#collectionpage'), rdflib.term.URIRef(u'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef(u'file:///home/teledyn/Work/CollectionPage'))

which clearly isn't going to work ... if I inspect the item members of the graph, the missing information is nested:

{u'url': u'https://example.com/', u'sameAs': [], u'@id': u'https://example.com/#organization', u'@type': u'Organization', u'name': u'Move Ahead'}
{u'publisher': {u'@id': u'https://example.com/#organization'}, u'potentialAction': {u'query-input': u'required name=search_term_string', u'@type': u'SearchAction', u'target': u'https://example.com/?s={search_term_string}'}, u'name': u'Move Ahead', u'url': u'https://example.com/', u'@id': u'https://example.com/#website', u'@type': u'WebSite'}
{u'inLanguage': u'en-US', u'name': u'Leasing', u'url': u'https://example.com/category/lease/', u'isPartOf': {u'@id': u'https://example.com/#website'}, u'@id': u'https://example.com/category/lease/#collectionpage', u'@type': u'CollectionPage', u'description': u'Reallybig Leasing Co. A leading global transportation services provider'}

Is there something I am missing here? This application reads json+ld found in the wild, so I can't control the input, but is there be some reliable way to massage the input so that it would work with the rdflib parser?

The json+ld works fine for Google

teledyn commented 4 years ago

I found my work-around: if I copy the @context into each of the @graph items and parse them one at a time, then combine the results, it resolves as expected

ethieblin commented 4 years ago

I had the same issue when writing tests: the json-ld that was generated by the rdflib-jsonld serializer could not be directly parsed with the rdflib-jsonld parser. It seems I get this problem only when the @graph key is in the json object.