Azure / opendigitaltwins-dtdl

Digital Twins Definition Language
Creative Commons Attribution 4.0 International
481 stars 164 forks source link

I cannot validate a DTML document using JSON-LD playground #161

Closed gsshiva closed 1 year ago

gsshiva commented 1 year ago

Pasting the following

{
  "@context": "https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/context/DTDL.v2.context.json",
  "@id": "dtmi:com:example:Thermostat;1",
  "@type": "Interface",
  "displayName": "Thermostat",
  "description": "Reports current temperature.",
  "contents": [
    {
      "@type": [
        "Telemetry",
        "Temperature"
      ],
      "name": "temperature",
      "displayName" : "Temperature",
      "description" : "Temperature in degrees Celsius.",
      "schema": "double",
      "unit": "degreeCelsius"
    }
  ]
}

on https://json-ld.org/playground/ results in the following error

jsonld.InvalidUrl: Dereferencing a URL did not result in a valid JSON-LD object. Possible causes are an inaccessible URL perhaps due to a same-origin policy (ensure the server uses CORS if you are using client-side JavaScript), too many redirects, a non-JSON response, or more than one HTTP Link Header was provided for a remote context.

I changed the @context from "dtmi:dtdl:context;2" to a resolvable URI, hoping that file contains the context payload. I also see this file

https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/metamodel/DTDL.v2.ModelRDF-SHACL.json which defines some of the modelling elements, but it is also has the @context of "dtmi:dtdl:context;2". I am pretty new to JSON-LD and it is not clear as to how I can validate DTDL documents using JSON-LD tools and API from other languages. Thanks

jrdouceur commented 1 year ago

There are two issues here. One is the failure of https://json-ld.org/playground/ to process the DTDL context file. I'm not really familiar with this site, but I am quite certain the file at https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/context/DTDL.v2.context.json is a valid JSON-LD context. This DTDL metaparser, which employs dotNetRDF to process the DTDL metamodel, uses this context file for the DTDL term definitions. You might try defining your own simple remote context, putting it in a public file, and seeing whether you can get it to work in https://json-ld.org/playground/ with a simple JSON-LD document.

The second issue is that you won't be able to fully validate a DTDL document using only JSON-LD tools. You will be able to validate the JSON-LD syntax and the vocabulary defined by JSON-LD terms the DTDL context file. You can also use an RDF Schema validator and a SHACL validator to validate the rdf:, rdfs:, and sh: constraints specified in the DTDL metamodel file https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/metamodel/DTDL.v2.ModelRDF-SHACL.json. However, you will observe that this file also contains many constraints specified with the dtmm: prefix. These are Digital Twin MetaModeling constraints that neither RDF Schema nor SHACL is powerful enough to express. These constraints are expressed in human-readable text in the DTDL Language Specification, but they are not at all simple.

gsshiva commented 1 year ago

Thanks so much for your quick response. Both points makes sense. Let me try to use some Java based JSON+LD processing APIs to process the DTML context and get back. When you say

These are Digital Twin MetaModeling constraints that neither RDF Schema nor SHACL is powerful enough to express. These constraints are expressed in human-readable text in the DTDL Language Specification, but they are not at all simple. Do you mean that these dtmm constraints are validated in in DTDL metaparser code and not through any open standard spec ?

gsshiva commented 1 year ago

Some of the terms within the context file is not resolvable. For example

curl --request GET \
  --url 'http://www.w3.org/2000/01/rdf-schema#' \
  --header 'Accept: application/ld+json' \

returns a valid JSOn+LD document. The curl for the term dtmm returns the Azure's home page and not the LD document.

curl --request GET \
  --url http://azure.com/DigitalTwins/MetaModel/ \
  --header 'Accept: application/ld+json'

So maybe even if I host the context file in a public site, the JSON+LD playground may not be able to resolve the dtmm and @vocab terms. Not clear as to how a generic JSON+LD processor would be able to process the context file.

jrdouceur commented 1 year ago

Do you mean that these dtmm constraints are validated in in DTDL metaparser code and not through any open standard spec ?

Loosely speaking, yes, although the actual process has a few more steps: The metaparser reads the DTDL metamodel file using the dotNetRDF library, and it executes a batch of SPARQL queries to retrieve definitions, directives, and constraints, which it outputs in a simpler but more verbose form known as the metamodel digest.

This digest, in turn, is read by the ParserGenerator, which code-generates most of the code for the DTDLParser. Specifically, all of the DTDL classes and their properties and validation criteria are code-genned into the generated C# files. All constraints are checked by codegenned code, whether the constraints are specfied using RDF Schema, SHACL, or DTMM. There is, for example, no general SHACL validator used in the process of parsing a DTDL document with the standard DTDL parser. Instead, the SHACL constraints have been transformed into C# code, which executes many times faster than interpreted SPARQL.

One thing we have considered (but not yet attempted) is to produce formal SPARQL definitions for each of the DTMM terms, as has been done for SHACL. This is an ambitious project that has not yet risen to a high enough priority to justify the effort, but it seems at least plausible that we may get to this at some point.

jrdouceur commented 1 year ago

Some of the terms within the context file is not resolvable. For example

So maybe even if I host the context file in a public site, the JSON+LD playground may not be able to resolve the dtmm and @vocab terms. Not clear as to how a generic JSON+LD processor would be able to process the context file.

IRIs in JSON-LD documents do not need to be resolvable. They are merely unique identifiers for abstract resources.

The RDF Schema prefix "rdfs:" is defined as "https://www.w3.org/2000/01/rdf-schema#" and this happens to resolve to a JSON document. However, the SHACL prefix "sh:" is defined as "https://www.w3.org/ns/shacl#" and this does not resolve to a JSON document. This is not a problem because the IRIs are not resolved during normal JSON-LD processing if they are just used as subjects, predicates, or objects in encoded RDF statements.

gsshiva commented 1 year ago

Thanks so much for explaining how all the json files are used as input to generate the validation code.

Thanks for educating me that JSON-LD terms need not be resolvable.

Referencing https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/context/DTDL.v2.context.json as well as https://raw.githubusercontent.com/Azure/opendigitaltwins-dtdl/master/DTDL/v2/metamodel/DTDL.v2.ModelRDF-SHACL.json in the @context does not seem to resolve to ld+json content-type, (the former returns text/html and the latter text/plain as the content-type) which explains why the JSON-LD playground fails to use the @context pointing to either of those URLs. I embedded the contents of the file as @context and DTDL sample gets validated at the JSON-LD level correctly.

In https://github.com/Azure/opendigitaltwins-dtdl/blob/master/DTDL/v2/context/DTDL.v2.context.json, I see the use of @vocab for some node's @type, like

"symbol": { "@id": "dtmi:dtdl:property:symbol;2" },
  "target": {
    "@id": "dtmi:dtdl:property:target;2",
    "@type": "@vocab"
  },
  "topUnit": {
    "@id": "dtmi:dtdl:property:topUnit;2",
    "@type": "@vocab"
  },
  "unit": {
    "@id": "dtmi:dtdl:property:unit;2",
    "@type": "@vocab"
  },
  "writable": { "@id": "dtmi:dtdl:property:writable;2" }
  1. Why does some property have a @type and some dont
  2. What does it mean to have the @type for some properties to be the same IRI

I will close this issue after your response to the above.

jrdouceur commented 1 year ago

The DTDL context file contains a few different things. At the top are prefixes and prefixed terms that are used in the DTDL metamodel but not in DTDL models, so you can ignore these. Anything that starts with "rdf:", "rdfs:", "sh:", or "dtmm:" is part of the DTDL definition and not part of the DTDL language.

The remainder of the context file defines terms that are used in the DTDL language, either as RDF objects or RDF predicates. Most of these terms define RDF objects, so they have no "@type" or anything other than an "@id", which indicates the IRI to which the term is mapped. Some of these RDF object terms will be used as the value of an "@type" property in a DTDL model, such as:

"Array": { "@id": "dtmi:dtdl:class:Array;2" },
"Acceleration": { "@id": "dtmi:standard:class:Acceleration;2" },

Others will be used as the value of a "unit" property, such as:

"metrePerSecondSquared": { "@id": "dtmi:standard:unit:metrePerSecondSquared;2" },

Although most of the context terms define RDF objects, a few dozen of the terms define RDF predicates. If a term maps to a property that can take an IRI value, it has a "@type" of "@vocab" to ensure that a string value will be coerced into an IRI:

"schema": {
  "@id": "dtmi:dtdl:property:schema;2",
  "@type": "@vocab"
},

If a term maps to a property that takes a literal value, there is nothing other than an "@id".

"comment": { "@id": "dtmi:dtdl:property:comment;2" }, -- string literal "writable": { "@id": "dtmi:dtdl:property:writable;2" }, -- boolean literal "maxMultiplicity": { "@id": "dtmi:dtdl:property:maxMultiplicity;2" }, -- numeric literal

If a term maps to a property that takes language-tagged string values or a language map, it has a special designation:

"displayName": {
  "@id": "dtmi:dtdl:property:displayName;2",
  "@container": "@language",
  "@language": "en"
},

The context section in the JSON-LD 1.1 spec describes all this in more detail.

gsshiva commented 1 year ago

Thanks so much for the detailed explanation. Truly appreciate it.

gsshiva commented 1 year ago

Closing the issue since embedding the content json works.

hadjian commented 10 months ago

@gsshiva @jrdouceur

Happy new year.

I stumbled upon this issue, as I had to solve it myself. Since this issue was filed quite recently, I think updating it makes sense.

Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL.

I added the missing @context key to the DTDLv3 context in a fork (and in a branch there), so you can try it out in the JSON-LD playground:

  1. Copy and paste e.g. the asset DTDL into the playground
  2. Scroll down to the context and replace the IRI with an URL to my fork of the context: https://raw.githubusercontent.com/hadjian/opendigitaltwins-dtdl/fix/at-context/DTDL/v3/context/DTDL.v3.context.json
jrdouceur commented 10 months ago

It is actually not standard compliant that the DTDL uses an IRI for referencing the context.

Actually, it is. The JSON-LD 1.1 specification, section 6.2 Node Objects, states:

If the node object contains the @context key, its value MUST be null, an absolute IRI, a relative IRI, a context definition, or an array composed of any of these.

The linked definition of absolute IRI states:

absolute IRI

An absolute IRI is defined in [RFC3987] containing a scheme along with a path and optional query and fragment segments.

There is no requirement that the IRI be a URL.

hadjian commented 10 months ago

Actually, it is. The JSON-LD 1.1 specification, section 6.2 Node Objects, states:

If the node object contains the @context key, its value MUST be null, an absolute IRI, a relative IRI, a context definition, or an array composed of any of these.

The linked definition of absolute IRI states:

You are right! The formal definition allows this. The JSONLD 1.1 Processing Algorithms and APIs standard then defines how to process the @context, if it is just a string and also states that this string must be a valid IRI. Thanks for the pointer!

What about the @context key in the DTDLv3 context? In the processing algorithms standard, paragraph 4.1.4. defines how to arrive at an active context. In case of a referenced context, step 5.2.5.2 then states that:

If the document has no top-level map with an @context entry, an invalid remote context has been detected and processing is aborted.

jrdouceur commented 10 months ago

Indeed, I think all of the context files in the DTDL repo should be updated/fixed. Each is currently a term map, but this map should properly be embedded as the value of an "@context" entry in an outer map.