Closed fils closed 2 years ago
Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL. -- w3.org/TR/json-ld11/
The JSON-LD processor makes a request like:
curl -v -H "Accept: application/ld+json" "https://schema.org/" > /dev/null
it gets back a response that includes a link:
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
That is followed to the context document located at https://schema.org/docs/jsonldcontext.jsonld
which is the remote context referenced in the example. That context specifies, among other items:
"schema": "http://schema.org/",
Hence, the properties are expanded with the namespace http://schema.org/
.
This is exactly why we needed clarification on the "https" vs "http" namespace issue in #52.
I agree that sticking with https://schema.org/
as the namespace does require specifying the default context like:
"@context: {"@vocab":"https://schema.org/"}
@datadavev
Thanks for the nice expansion...
Going further you can look at the context file pulled down and look for http
https is sadly missing and curl for either https://schema.org/docs/jsonldcontext.jsonld or http://schema.org/docs/jsonldcontext.jsonld returns the same file.. don't get me started..
looking for http (or https via substring match) we get
~/tmp grep http jsonldcontext.json
"@vocab": "http://schema.org/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"schema": "http://schema.org/",
"owl": "http://www.w3.org/2002/07/owl#",
"dc": "http://purl.org/dc/elements/1.1/",
"dct": "http://purl.org/dc/terms/",
"dctype": "http://purl.org/dc/dcmitype/",
"void": "http://rdfs.org/ns/void#",
"dcat": "http://www.w3.org/ns/dcat#",
"httpMethod": { "@id": "schema:httpMethod"},
yet.. in the example https://tinyurl.com/y99kj7d7 things correctly expand to their https namespace, not http. Any insight into why this is the case?
This seems like it should not occur is the above context is pulled. Seems like application logic coming into play perhaps?
This is the challenge of namespace ambiguity introduced by the "s". Despite progression towards a duality of schema.org concepts under http
and https
, the official and current context for schema.org
resides at https://schema.org/docs/jsonldcontext.jsonld
and that context specifies http://schema.org/
as the namespace.
Writing:
"@context": {"@vocab":"https://schema.org/"}
tells the JSON-LD processor that the entire context definition for the document is exactly the map that is the value of the "@context"
key. Since that map does not contain a reference to a remote context (i.e. using the @import
key), that map is the entirety of the context and so the JSON-LD processor does not retrieve a remote context when processing the document. Instead, the default context IRI specified by the value of @vocab
is used to expand the relative IRIs in the document. Dataset
is equal to https://schema.org/Dataset
.
It's important to note that remote contexts are retrieved by a JSON-LD processor by following the spec for Remote Document and Context Retrieval. Basically, requests are made, following 303 redirects and using a Accept: application/ld+json
header. Steps 4 and 5 therein describe how Link headers in the response are handled, and this step is typically not visible when using curl and other common HTTP clients unless specifically looking for that information.
Anyway, the outcome of all this is that specifying a context of "@context":{"@vocab":"https://schema.org/"}
means that is the entire context. Specifying "@context":"https://schema.org/"
means the JSON-LD processor will go and fetch a context document from that IRI, and that document provides the context map that uses a namespace of http://schema.org/
for the schema.org terms.
This of course does have much broader implications, since in specifying the context of "@vocab":"https://schema.org/"
, none of the information in the remote context is being retrieved and utilized in the processing of the document.
[edit: added note on default context]
It is as I figured.... I appreciate the confirmation though. Sigh. From a developer POV, this little "s" really cause a lot of "hit" (sorry. there is a missing "s" in that "hit") ;)
It's a widespread challenge, e.g. https://github.com/RDFLib/rdflib/issues/1120
its the cost of conflating the location of the resolver to dereference an identifier with the identifier.
Note that this issue will vaporize when schema.org v 12 comes out in March.
See: https://github.com/schemaorg/schemaorg/blob/main/data/releases/12.0/schemaorgcontext.jsonld
@datadavev you made my day!!!!!!!
Big relief for me too - there's a whole bunch of normalization code and gymnastics that can go away. Huzzah!
Hi, could someone confirm if these two @context definitions are different or equivalent now? I'm seeing both forms in the ESIP recommendations examples, and I want to know if there is a "more correct" version:
{
"@context": "https://schema.org/",
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
vs.
{
"@context": {
"@vocab": "https://schema.org/"
},
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
@bonnland
The first is valid for JSON-LD 1.0
The second for JSON-LD 1.1
If you are working at this point forward, you should be using the map, the second one.
We probably should update all of our examples to use the recommended form.
@datadavev I get your point.. that is only true in the context (no pun intended) that you view the document as a JSON-LD 1.1 document in both cases, correct?
I need to revisit now why I had processing errors in 1.1 mode with the previous approach when, as you point out, it seems a valid 1.1 pattern for remote context. (though that seems very poorly worded in the docs.. since all the contexts are typically web resolved in principle)
oddly there is
A context definition MUST be a map whose keys MUST be either terms, compact IRIs, IRIs, or one of the keywords @base, @import, @language, @propagate, @protected, @type, @version, or @vocab.
which seems at odds with the remote context reference https://www.w3.org/TR/json-ld11/#example-5-referencing-a-json-ld-context
Have you had the previous (un-mapped version) fail in a forced 1.1 process? I have.
@datadavev Is it just me or the docs say...
"a context MUST be a map, except when it's not a map and then it is a remote context, though you can use @import for a remote context too, to make the context a map.... oh .. and any context you provide that isn't relative, is pulled remotely based on the IRI you provide" (this seems even more fun to read if you do it in an English accent) ;)
that seems less than wonderful :)
it is messy, and further complicated by the opacity of what can go on behind the scenes when retrieving a remote context [^1].
If the value of @context
is a relative or absolute URL, the document retrieved from that URL becomes the context.
In this case:
{
"@context": "http://shorturl.at/ciqMW",
"title": "A remote context doc"
}
the contents of the document retrieved by following the rules for JSON-LD retrieval becomes the context. That URL resolves to the JSON-LD:
{
"@context": {
"@vocab":"http://a.b/c/"
}
}
That JSON-LD is processed like:
{
"@context": {
"@vocab":"http://a.b/c/"
},
"title": "A remote context doc"
}
and so expands like:
[
{
"http://a.b/c/title": [
{
"@value": "A remote context doc"
}
]
}
]
On the other hand, if the value of @context
is a map, then that map becomes the context. So for example:
{
"@context": {
"@vocab": "http://shorturl.at/ciqMW/"
},
"title": "A local context doc"
}
The context is exactly as written, and the document expands to:
[
{
"http://shorturl.at/ciqMW/title": [
{
"@value": "A local context doc"
}
]
}
]
[^1]: https://www.w3.org/TR/json-ld11-api/#loaddocumentcallback, especially steps 4-5
@datadavev
Your post above really needs to go into the docs and It's more clear the JSON-LD docs IMHO. I do follow what you are saying and based on that I think I have a bug report to make up for a JSON-LD lib I use. :)
Just to clarify all of this, I think our recommendations have shifted but we have not updated our documentation. Now that schema.org has clarified that the true namespace is http://schema.org/
, but that https://schema.org/
can be used to retrieve a context file, I think this is what we are recommending:
{
"@context": {
"@vocab": "http://schema.org/"
},
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
{
"@context": "http://schema.org/",
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
OR
{
"@context": "https://schema.org/",
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
{
"@context": {
"@vocab": "https://schema.org/"
},
"@type": "Dataset",
"author": {
"@type": "Person",
"name": "Jane Goodall"
}
}
If this is right, we need to updated all docs, guidelines, examples, and shacl rules.
Started branch feature_151_context_namespace for fixing the namespace context consistency issues. More changes needed before we have a consistent set of guides.
(1) has the effect of setting the default vocabulary. (2) has the effect of including the context statements defined in the referenced context document.
Effectively (1) replaces the document https://schema.org/docs/jsonldcontext.jsonld with the document:
"@context": {
"@vocab": "http://schema.org/"
}
Hence, the general recommendation would be (2).
@mbjones in your recent post it says "schema.org has clarified that the true namespace is http://schema.org", but in the examples 'http://schema.org/' is used (with the terminal backslash). I'm guessing the true namespace should be http://schema.org/?
For reference, the schema.org
context document, and so namespace definition, is located here: https://schema.org/docs/jsonldcontext.jsonld
the @vocab there is http://schema.org/, there's my answer. Thanks!
Thanks for the clarifications, and yes, I should have said http://schema.org/
. I'll go fix that.
So, if the preference is for option 2, in our full example, how do we define the additional namespaces we need? Right now, on the branch I have the full.jsonld
example as:
"@context": {
"@vocab": "http://schema.org/",
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"spdx": "http://spdx.org/rdf/terms#"
}
Should the guidance be that we recommend option 2, except for when people need to define additional namespace prefixes?
"@context": [
"https://schema.org/",
{
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"spdx": "http://spdx.org/rdf/terms#"
}
]
[edit: use https for schema.org retrieval]
So to be clear the schema.org FAQ at https://schema.org/docs/faq.html#19 is now wrong? Schema.org is saying to use http? Also the developer section at https://schema.org/docs/developers.html shows there are multiple context files for the various namespace approaches. Yet our recommendation is to stick with the old http pattern?
I think the FAQ is a bit misleading. The namespace is http://schema.org/
, associated documents (such as the context) can be retrieved using http or https. The context document for schema.org defines the namespace and that is currently located at https://schema.org/docs/jsonldcontext.jsonld.
However, just to confuse things more, there are http and https variants of the vocabulary!
That's what I mean.. the multiple vocab elements. I understand all of this. and I appreciate that currently the https file call returns http namespaced file (which I don't agree with) :)
this just worries me... it's a kicking the can down the road event IMHO.
agree to disagree I guess
Adding to the confusion, some libraries, e.g. RDFLib internally define constants for common namespaces, and it is using https://schema.org/
as the namespace. So I guess be prepared to be flexible.
The libraries are going to be a pain.. major pain..
Also, you can't content negotiate for the schema.org JSON-LD context anyway. Due to DOS issues they don't allow it so then libraries have to implement the resolution as a special case.
you can't curl negotiate at https://schema.org for the context.
Right, there's a different set of rules beyond simple content negotiation^1 for finding the context - need to look at the response link header:
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
This is also something that is poorly implemented in the major libs (pyld and rdflib at least). I use a patched version of pyld to get around this issue and honor the json-ld processing rules in the spec.
right..
curl -v https://schema.org
...
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
...
and I get it.. (literal and figurative) ;)
As you point out though the issues with the python libraries (same as in the Go libraries by the way)..
This is an implementation mess... my point though is that the trend in general will be toward https not away and since both namespace uses are accepted by schema.org (unless that policy is now changed?) we are tossing out future LOD patterns if we go http since the data web will be https, it has to be.
I'm not trying to change any minds. It sounds like it is already a done deal. I just have to resolve how to connect the other groups I work who are https focused now with SOS which will be http focused.
I don't think it's a done deal if @fils and @datadavev aren't on board -- you two have more practical experience with this than anyone I know. I am just trying to clean up our recs and be consistent. And I don;'t have a strong opinion myself -- I agree the future is https, but thought SO had decided to stick with http in their context doc. If there is a straightforward way for us to recommend https where most libs and the shacl processor, etc would recognize the terms as SO properly, then that has advantages. But given that https://schema.org/ returns a JSON-LD context with the http namespace, it seems like they are still using http. Please, propose what you think we should do, and how providers and consumers should handle it.
You are correct there.. their default is to return the http namesapce even though they are rather indecisive elsewhere in their documentation. The result of that unfortunately is they seed confusion and delay (cue Thomas the Tank Engine) in the library developers and elsewhere. :)
Science-on-schema.org is about recommendations for application of schema.org
to this domain, and so my impression is this group should not be overriding the specification. Hence, the recommendation here should be to use the namespace as published, which would be http://schema.org/
. Options for specifying the context then include:
{
"@context":"https://schema.org/"
}
{
"@context":"https://schema.org/docs/jsonldcontext.jsonld"
}
{
[
"@context":"https://schema.org/",
{
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"spdx": "http://spdx.org/rdf/terms#"
}
]
}
{
"@context": {
"@vocab": "http://schema.org/"
}
}
Where:
schema.org
and including other namespaces. Note that other remote contexts may also be specified in the list.schema.org
context, but makes http://schema.org/
the default namespace for the document. Implementors should be aware that this may change in the future (i.e. "http" -> "https") and that existing implementations may internally use "https://schema.org/
" as the namespace (e.g. RDFLib). Hence consumers should probably be applying namespace normalization to schema.org content to ensure consistent interpretation in an RDF processing environment.
+1 on Recommending namespace normalization. Dealing with the two namespaces has been an ongoing challenge with metadata integration in EarthCube GeoCodes, requiring messy SPARQL queries.
OK, summarizing... going with Dave's examples, I'll write up a plan to recommend using the http
namespace definition (as SO uses by default) by retrieving the context file from the https location, noting that its also possible to retrieve it from the http location, and that the @vocab default can be used with http as well. We don't recommend using @vocab with the https URL, but harvesters and processors should in general normalize and treat https versions of the terms as equivalent to the http terms for SO. Finally, if one needs to include multiple namespaces, that can be done by building a context map from the retrieved context file plus additional namespace definitions. In my testing, I think the syntax in Dave's examples was a little turned around, so I think we should be using:
{
"@context": [
"https://schema.org/",
{
"prov": "http://www.w3.org/ns/prov#",
"provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
"spdx": "http://spdx.org/rdf/terms#"
}
],
"@type": "Dataset",
"name": "Test data",
"prov:wasDerivedFrom": {
"@id": "https://doi.org/10.xxxx/Dataset-1"
}
}
Work on branch feature_151_context_namespace:
examples
Checked that shapes all validate with the namespace changes on our example files, and merged PR #199. This issue will remain open for commentary for a bit longer, but the planned changes are now merged into develop
.
Reviewed at meeting on 7 Feb 2022 -- agreed it was complete, but reopen this issue if discrepancies are found.
So I have been running into this with @smrgeoinfo and I saw it in the example by @datadavev
Using Dave's example of
If you place this in the JSON-LD playground link you will see it expands to http, not https
modify the context to a map as
It will expand correctly with https as at https://tinyurl.com/y99kj7d7
reference https://www.w3.org/TR/json-ld/#context-definitions
specifically:
It would appear that we need to make sure examples and recommendations (at least if we want JSON-LD 1.1, which I suspect this is part of) must be maps.
I've been running into this issue in some of my development work.... Comments and observations welcome..