RDFLib / rdflib-jsonld

JSON-LD parser and serializer plugins for RDFLib
Other
280 stars 71 forks source link

Schema.org moving from HTTP Content Negotiation to JSON-LD 1.1 "Link:" header for context file #85

Open danbri opened 4 years ago

danbri commented 4 years ago

This happened faster than planned due to a DOS attack this week, details in https://github.com/schemaorg/schemaorg/issues/2578#issuecomment-632227864

Schema.org no longer publishes a JSON-LD context file using HTTP content negotiation. Our homepage URL always returns HTML. This affects the parsing of all JSON-LD that expects to get a context definition from URLs "http://schema.org", "https://schema.org", "http://schema.org/", "https://schema.org/".

The URL of our context file is https://schema.org/docs/jsonldcontext.jsonld

We will shortly update the site to declare this URL via a Link header (see above issue for details).

I am filing this issue

danbri commented 4 years ago

The main Schema.org site should have the headers discussed now, i.e.

danbri commented 4 years ago

@hsolbrig can you suggest a workaround, at least for short term use? Can we pass in the context when invoking parser (by URL or by content?) /cc @Gnomus042

westurner commented 3 years ago

Is there no way to do this without requiring a custom HTTP header? Why is that part of the data specified out-of-band from the rest of the document?

(edit) Static files (with no HTTP server configuration dependency) are most scalable and archivable.

nicholascar commented 3 years ago

@rob-metalinkage, is this going to cause problems for JSON- > JSON-LD expansion due to the separate Context?

nicholascar commented 3 years ago

@danbri, @westurner, @hsolbrig RDFlib maintainers are assembling volunteers to complete this tools' JSON-LD 1.1 implementation and then to merge it into RDFlib core. That should make it easier for all to just "do" JSON-LD with RDFlib.

rob-metalinkage commented 3 years ago

@nicholascar I dont think it causes any extra problems, as using just a model namespace to perform JSON->JSON-LD expansion is unsafe anyway.

The patterns appearing to be in the wild seems to be:

Data model = X context URI = .json

i.e. there is no way to discover for a model X the relevant context file.

Or the requirement to perform content negotation is based on a model

Datamodel = X Context = X (Accept ld+json)

this is being taken off the table as a bad idea according to this issue, but it has a deeper issue IMHO that if your data model is described in OWL , then ld+json should be the data model serialised as JSON-LD, not necessarily a JSON-LD context for the model.

The options for canonical mechanisms to discover the actual URL for a context seems to be: a) returns Link header for alternates b) supports a Profile "alt" which can be accessed for by either header or a URL parameter<X?_profile=jsoncontext> where the profile jsoncontext is registered and well-known. (dx-prof-conneg)

if dx-prof-conneg supports the same Link syntax and if a resource chooses to embed the Link headers for all the available profiles and serialisations from the "alt" view by default the two approaches are consistent I think.

I'd always choose the latter, as JSON context is not the only resource I'd want to be able to discover about a model. JSON-schema is also valuable, and SHACL and HTML and maybe other forms.

westurner commented 3 years ago

Maybe I'm misunderstanding? From https://www.w3.org/TR/json-ld11/#the-context ::

Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL. Assuming the context document in the previous example can be retrieved at https://json-ld.org/contexts/person.jsonld, it can be referenced by adding a single line and allows a JSON-LD document to be expressed much more concisely as shown in the example below:

{
 "@context": "https://json-ld.org/contexts/person.jsonld",
rob-metalinkage commented 3 years ago

@westurner you are right it doesnt need necessarily need a custom header, but there are a couple of things that need care here:

1) the agent that is "adding a single line" somehow has to know the URL "https://json-ld.org/contexts/person.jsonld"

we can say its all client code to tell RDF lib exactly what to include and maybe not think about this - but this issue is about other approaches such as trying to resolve namespaces such as schema.org and getting a context.

2) contexts may include other contexts - so the behaviour needs to be explicit in terms of exactly how to handle potential conflicts (prefix strings bound to different URIs) and default namespaces (@value, @base) - having been exploring this I find the JSON-LD documentation extremely hard to follow and lacking basic examples, and RDFLib is silent. IMHO RDFlib should encapsulate and explain basic practices here without needing interpretation of JSON-LD specification to get started.

3) there seem to be quite a lot of ways to bundle a set of object descriptions in JSON-LD - including arrays, @graph constructs, container objects etc. Probably the JSON-LD serialiser needs to be able to handle these if we want to deliver a a serialisation for use in a specific context - such as to meet an API payload requirement. The JSON-LD framing spec makes this clear - see #95

datamusee commented 2 years ago

I think the following code (failing to load the schema.org context) is linked to the present problem, but doesn't understand the workaround `from rdflib import Graph, plugin from rdflib.serializer import Serializer

jsonldSample = """ { "@context": "https://schema.org", "@type": "LocalBusiness", "name": "La Tour Eiffel", "address": { "@type": "PostalAddress", "addressLocality": "Paris", "addressRegion": "75007", "streetAddress": "Champ de Mars, 5 Avenue Anatole France" }, "description": "Monument emblématique de Paris, la tour Eiffel est une tour de fer puddlé de 324 mètres de hauteur construite par Gustave Eiffel à l’occasion de l’Exposition Universelle de 1889 et qui célébrait le premier centenaire de la Révolution française.", "url": "https://www.toureiffel.paris", "image": "https://www.toureiffel.paris/sites/default/files/2017-10/monument-landing-header-bg_0.jpg", "pricerange": "de 2,5 à 25 euros", "telephone": "08 92 70 12 39" } """

g = Graph().parse(data=jsonldSample, format='json-ld') print(g.serialize(format='json-ld', indent=4)) print(g.serialize(format='nt', indent=4))`