FIWARE / tutorials.Understanding-At-Context

:blue_book: FIWARE-LD 101: Understanding NGSI-LD `@context`
MIT License
8 stars 6 forks source link

Incorrect generation of @Context-File if prefixes are used in variable names #11

Closed IngMiad closed 5 months ago

IngMiad commented 5 months ago

Hello,

Summary

I followed this tutorial and used the context-file-generator to generate our custom @Context-File. Unfortunately the context-file-generator seems to generate wrong @Context-files that produce errors in production.

Steps to reproduce

I have generated the data model according to the instructions from https://github.com/FIWARE/tutorials.Understanding-At-Context:

  1. First I generated a model.yaml file. I don't want to share our whole model file in github (please ping me if I should provide it for you for testing-purposes), but I give you an example for problematic entities:
    ###############################
    ## CompressedAirDistribution ##
    ###############################
    CompressedAirDistribution:
      description: "Process management system function 'Compressed air distribution' with KKS function code: 'QFB' (de: 'Druckluftverteilung')"
      required:
        - "id"
        - "type"
        - "dateObserved"
        - "isSubsystemOf"
      properties:
        id:
          anyOf:
            - description: "Property. Identifier format of any NGSI entity"
              maxLength: 256
              minLength: 1
              pattern: '^[\w\-\.\{\}\$\+\*\[\]`|~^@!,:\\]+$'
              type: "string"
            - description: "Property. Identifier format of any NGSI entity"
              format: "uri"
              type: "string"
          description: "Unique identifier of the entity"
          x-ngsi:
            type: "Property"
        type:
          description: "NGSI Entity type. It has to be 'CompressedAirDistribution' (a specific PCS part)"
          enum:
            - "CompressedAirDistribution"
          type: "string"
          x-ngsi:
            type: "Property"
        dateObserved:
          description: "Date of the observed entity defined by the user."
          type: "string"
          x-ngsi:
            type: "Property"
        isSubsystemOf:
          description: "This component is a sub component of another system"
          type: "string"
          x-ngsi:
            model: "https://schema.org/URL"
            type: "Relationship"
        kksFunction:
          description: "KKS function of property (level 1)"
          type: "string"
          x-ngsi:
            type: "Property"
        0QFB15CP001:Dr_DrLuAnl:U_AH:
          description: "Compressed air distribution Nr. 15 Measuring circuit Pressure Nr. 001  (HL alarm)"
          type: "boolean"
          x-ngsi:
            model: "https://schema.org/Boolean"
            type: "Property"
        0QFB15CP001:Dr_DrLuAnl:U_WL:
          description: "Compressed air distribution Nr. 15 Measuring circuit Pressure Nr. 001  (LL warning)"
          type: "boolean"
          x-ngsi:
            model: "https://schema.org/Boolean"
            type: "Property"
        0QFB15CP001:Dr_DrLuAnl:U_AL:
          description: "Compressed air distribution Nr. 15 Measuring circuit Pressure Nr. 001  (LL alarm)"
          type: "boolean"
          x-ngsi:
            model: "https://schema.org/Boolean"
            type: "Property"
        0QFB15CP001:Dr_DrLuAnl:U_WH:
          description: "Compressed air distribution Nr. 15 Measuring circuit Pressure Nr. 001  (HL warning)"
          type: "boolean"
          x-ngsi:
            model: "https://schema.org/Boolean"
            type: "Property"
        0QFB15CP001:Dr_DrLuAnl:U:
          description: "Compressed air distribution Nr. 15 Measuring circuit Pressure Nr. 001  (Analog value measurement)"
          type: "number"
          x-ngsi:
            model: "https://schema.org/Number"
            type: "Property"
  2. then I checked its validity in Swagger-UI which has no issues

grafik

  1. afterwards I used the Smart Data Model Tool to generate a @Context file (see datamodels.context-ngsild.jsonld and datamodels.md)

grafik

Issue

Swagger and the tool say that the provided API is valid and successfully generate the @context-file. I can also work with the generated file using IoT-Agent for OPC UA, Orion-LD and halfway also Mintaka - Mintaka is successfully writing entities based on provided @Context file, but it doesn't allow to use temporal interface based on that file. I discussed the issue in another issue with Mintaka developers: https://github.com/FIWARE/mintaka/issues/167

According to @wistefan the reason why Mintaka declines the provided @context-file in read-requests is that the @context-file has been generated incorrectly. In case ":" are used, these prefixes also need to be declared in @context file. As an example

Expectations

From my understanding the variable names I defined are according to NGSI-LD standard: grafik

So I would expect the context-file-generator to either throw an error if the variable names are not allowed (instead of claiming that API is valid) or generate the @context-file correctly.

It is also not expected that only Mintaka recognizes the wrong context-file in read-requests, but not in write-requests. Orion-LD and IoT-Agent for OPC UA don't seem to recognize wrong @context-file at all.

jason-fox commented 5 months ago

I have altered the code to throw an error if the simple @context generator finds a colon : in the attributes. I have also upgraded the following NOTE to a CAUTION in the readme:

[!CAUTION] The simple NGSI-LD @context generator in the tutorial defaults to using uri.fiware.org namespaces and updates with corrected URIs based on the x-ngsi.uri and x-ngsi.uri-prefix attributes. The code and defaults found within this tutorial can be altered if necessary.

For more complex scenarios, additional @context generation tools can be found on the Smart Data Models website.

jason-fox commented 5 months ago
  1. then I checked its validity in Swagger-UI which has no issues

There is no restriction on a property name with multiple colons in it in OpenAPI so this works fine.

  1. afterwards I used the Smart Data Model Tool to generate a @context file (see datamodels.context-ngsild.jsonld and datamodels.md)

As now clarified in the cautionary note above - you can generate an @context file, but not necessarily one that you want. This tutorial is not attempting to be an authority for linked data URIs.

Elements like: "fiware": "https://uri.fiware.org/ns/data-models", are just an example authority to demonstrate the use of a JSON-LD Compact IRI. It would be much better to use real world authoritative IRIs such as from schema.org or smart-data-models.org or SAREF or any existing ontology within your domain rather than using this dummy placeholder.

From my understanding the variable names I defined are according to NGSI-LD standard:

Not quite. The ABNF defining how a prefix attribute name can be added to an attribute doesn't allow for multiple colon separators : within an attribute name. Depending upon the JSON-LD parser used this may or may not be caught as a 400 Bad Request.

Since all NGSI-LD payloads are valid JSON-LD, the incoming request is parsed to Long Names (IRIs) via a JSON-LD expansion operation before it hits the broker and any response is reduced back to the short names via compaction using the given @context to the user.

For example this payload:

{
  "id": "urn:ngsi-ld:myType:001",
  "type": "myType",
  "x:special": {"type": "Property", "value": "..."},
  "y:special": {"type": "Property", "value": "..."},
  "@context": [
    {
      "example": "http:example.com/",
      "x": "http:/foo/",
      "y": "http:/bar/",
      "myType": "example:entity"    
    },
    "https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.7.jsonld"
  ]
}

Is in reality held like this internally:

{
 "@id": "urn:ngsi-ld:myType:001",
 "@type": "http:example.com/entity"
 "http:/foo/special": "...",
 "http:/bar/special": "..."
}

According to the rules of JSON-LD. You can see that the two attributes called "special" have been differentiated using a prefix. Similarly the short name "myType" has undergone multiple expansions to be transformed to http:example.com/entity

So I would expect the context-file-generator to either throw an error if the variable names are not allowed (instead of claiming that API is valid) or generate the @context-file correctly.

Given the complexity of the swagger document containing attributes with colons it is probably safest just to get the generator to throw an error at this point.

IngMiad commented 5 months ago

Hello Jason,

thank you for the quick response and the addition to tutorial. I think it will be helpful for users to know that this is a simple generator don't neccessary creates the correct results like in our case. To make getting started a bit easier you could maybe also add which smart data tool to use instead and then maybe also which other ways users have to use to generate an @context-file. We thought that we have to create this model.yaml using Swagger-Editor and can generate the @context file from that. As far as I understood smart data model website takes another approach (eg. generate context file from payload example).

So Smart Data Model website only has possibilities to ... 1) Generate a data model based on a payload (actual usage data) 1) I tested that: Unfortunately this only works for small data models as the websites puts the whole payload example into the URL instead of using a http body and http URL is limited. When I try this tool I get an error. grafik 2) Generate a data model based on csv payload 1) For that I need to fill the formular - If I have a big data model like we have, I would have to enter 57 entities with ~3000 tags manually (?) I think that would take ages and is not very handy. Moreover I can't be sure if the URL is being used here as well and that the request becomes too long for big data models. 3) Use a Master Sheet for simple data models 1) Here is the same issue as with the csv payload. I would have to configure many fields manually in excel file which takes a long time. 4) Generate an example based on schema 1) This is not what we need -> I am looking for a solution to generate the @context-file and not payload examples 5) Some schema or json validators like https://www.jsonschemavalidator.net/ 1) This is also one step beyond and doesn't seem to work correctly as well. Our wrong context-file is being shown as valid grafik

So how should a user be able to generate an @context file for big data models for their FIWARE platform? The optimum would be if we could use the model.yaml for that use case, because we have put a lot of effort into model.yaml generation for big data models and would prefer to generate a @context-file from that. If that is not possible please provide a way on how to generate context-files for big data models. Thank you in advance!

Greetings Inga

jason-fox commented 5 months ago

Since colons : are such a pain in JSON-LD what can you do about it? Well you could replace them with an alternate character, for example instead of 0QFB15CP001:Dr_DrLuAnl:U_AH use 0QFB15CP001_Dr_DrLuAnl_U_AH

isSubsystemOf : This component is a sub component of another system
kksFunction: KKS function of property (level 1)
0QFB15CP001_Dr_DrLuAnl_U_AH: Compressed air distribution Nr. 15 
                             Measuring circuit Pressure Nr. 001  (HL alarm)
0QFB15CP001_Dr_DrLuAnl_U_WL: Compressed air distribution Nr. 15 
                             Measuring circuit Pressure Nr. 001  (LL warning)
...etc

But this still leaves the definition of terms within the Ontology clear - what really is an attribute you call isSubsystemOf as far as external users connecting to your data space are concerned? Is what you internally call isSubsystemOf a https://schema.org/isPartOf or maybe it is better to reference it as http://www.w3.org/2000/01/rdf-schema#subPropertyOf so that someone accessing your data can then safely map back to their preferred terms.

The idea of a Linked Data model is to know what the structure is and then be able to use your preferred terminology throughout whilst allowing others to access using their own terminology.

jason-fox commented 5 months ago

The simplest JSON-LD @context file looks something like this, with every short name on the left hand side and a proper common agreed ontological IRI on the righthand side.

{
  "@context":  {
      "x": "http:/foo/x",
      "y": "http:/bar/y",
      "myType": "http:example.com/type"   
    }
}

You don't need to use IRI Compaction. It is helpful as it shortens the overall file length.

Obviously things are much easier if you can reuse an existing data model from an existing ontology.

jason-fox commented 5 months ago

Some schema or json validators like https://www.jsonschemavalidator.net/

[!CAUTION] JSON Schema and JSON-LD are two completely separate concepts see: Stack Overflow

Use the JSON-LD playground to validate JSON-LD payloads.

jason-fox commented 5 months ago

, I would have to enter 57 entities with ~3000 tags manually (?)

An Entity is usually a class of object with a common set of attributes. I would expect widget001 and widget057 to both have common type of Widget and have common set of attributes - I wouldn't expect there to be 3000 unique attribute names within the system. All Widget entities have a color attribute - it would be the data within that entity which would differ.

If you look at the NGSI-LD core @context file you will see the line:

"@vocab": "https://uri.etsi.org/ngsi-ld/default-context/"

This means that an attributes you fail to define are still assigned an IRI - e.g.:

unknownAttribute is still accessible as https://uri.etsi.org/ngsi-ld/default-context/unknownAttribute

jason-fox commented 5 months ago

The simplest way to fudge a working context would be to take the imperfect generated @context

{
    "@context": {
        "type": "@type",
        "id": "@id",
        "ngsi-ld": "https://uri.etsi.org/ngsi-ld/",
        "fiware": "https://uri.fiware.org/ns/data-models#",
        "schema": "https://schema.org/",
        "0QFB15CP001:Dr_DrLuAnl:U": "fiware:0QFB15CP001:Dr_DrLuAnl:U",
        "0QFB15CP001:Dr_DrLuAnl:U_AH": "fiware:0QFB15CP001:Dr_DrLuAnl:U_AH",
        "0QFB15CP001:Dr_DrLuAnl:U_AL": "fiware:0QFB15CP001:Dr_DrLuAnl:U_AL",
        "0QFB15CP001:Dr_DrLuAnl:U_WH": "fiware:0QFB15CP001:Dr_DrLuAnl:U_WH",
        "0QFB15CP001:Dr_DrLuAnl:U_WL": "fiware:0QFB15CP001:Dr_DrLuAnl:U_WL",
        "CompressedAirDistribution": "fiware:CompressedAirDistribution",
        "dateObserved": "fiware:dateObserved",
        "isSubsystemOf": "fiware:isSubsystemOf",
        "kksFunction": "fiware:kksFunction"
    }
}

And manually replace those colons : with underscores and use your own authority like example.com

{
    "@context": {
        "example": "https://uri.example.org/ns/data-models#",
        "0QFB15CP001_Dr_DrLuAnl_U": "example:0QFB15CP001_Dr_DrLuAnl_U",
        "0QFB15CP001_Dr_DrLuAnl_U_AH": "example:0QFB15CP001_Dr_DrLuAnl_U_AH",
        "0QFB15CP001_Dr_DrLuAnl_U_AL": "example:0QFB15CP001_Dr_DrLuAnl_U_AL",
        "0QFB15CP001_Dr_DrLuAnl_U_WH": "example:0QFB15CP001_Dr_DrLuAnl_U_WH",
        "0QFB15CP001_Dr_DrLuAnl_U_WL": "example:0QFB15CP001_Dr_DrLuAnl_U_WL",
        "CompressedAirDistribution": "example:CompressedAirDistribution",
        "dateObserved": "example:dateObserved",
        "isSubsystemOf": "example:isSubsystemOf",
        "kksFunction": "example:kksFunction"
    }
}

According to the ABNF in the Spec - this is not a valid attribute name: 0QFB15CP001:Dr:DrLuAnl:U, but this one with underscores would be0QFB15CP001_Dr_DrLuAnl_U - so you'd have to use that instead.

jason-fox commented 5 months ago

Here is an example of a valid @context being validated in the playground - replace an underscore with a colon and you get an error on screen.

Here is the same JSON-LD Payload with attributes

IngMiad commented 5 months ago

Hello Jason,

thank you very much for your explanations! This is very helpful for our use case. I will especially investigate your suggestions on 1) isSubsystemOf property (replace by common ones from schema.org or similar) 2) JSON Playground for future @context-file validation 3) Use our own authority like example.com

My additional comments are:

"Obviously things are much easier if you can reuse an existing data model from an existing ontology."

In our case we are creating a schema for a so called process control system (PCS) for the use case of a special kind of renewable power station. Each PCS data model from different types of power stations is a bit different and those variable names like 0QFB15CP001:Dr_DrLuAnl_U:AL are also custom ones defined by the site based on a standard called 'Identification System for Power Stations' (KKS) - This is why we are creating a custom model for our site. I don't see any additional value for the community if we would publish that as public data model, because the structure is custom in at least each type of power station but also between the same type of power stations depending on their age and custom realizations. This way there are also no public ontologies that would fit our data model from process control system and we ended up creating our own one.

"I wouldn't expect there to be 3000 unique attribute names within the system"

There are more than 9.000 tags or variables from this data source Process Control System (PCS) only => We have further data sources and much more entities and tags in total. So PCS is only one data source among others which provides the 9.000 tags alone.

We already sorted out the attributes from PCS data model that we don't neccessarily need so that we end up with ~3.000 tags. Grouping these tags by function, we end up having more than 50 entities with individual attributes. In such a power station that we operate, it is usual to have a really high amount of components and variables that you can read / set for power generation. I would also say that this is not unusual in smart energy field in general.

It is true that for a single location of a power station each component or entity from PCS might only have one instance of the entity. But as the structure of each entity is completely different and we wan't to query single entities in the end, we indeed have the need to define each component as a single entity in schema. We can't create one entity and use this for all of our individual ~50 entities of PCS as they have different amount and type of attributes. It would also not make sense to create a big entity with ~3.000 tags from my point of view, because you typically query related attributes by function (= our entities) and such a big entity with 3.000 tags is quite confusing and not handy for the user who reads data.

"According to the ABNF in the Spec - this is not a valid attribute name: 0QFB15CP001:Dr:DrLuAnl:U, but this one with underscores would be0QFB15CP001_Dr_DrLuAnl_U - so you'd have to use that instead."

It is important for us to use similar attribute names in FIWARE than defined in PCS, as FIWARE data relates to real-life attributes from PCS. Because of the amount of PCS attributes, the mentioned above KKS name system is being used to structure the variable names and their meaning. The KKS name scheme has three parts which are seperated by "/" and "." (KKS-identificator, custom var name, ending which suggests meaning of the value eg. analogue value measured, setpoint, ...). I exchanged not valid characters "/" and "." by ":", so that it is clear where each name part starts / stops. grafik

To seperate those name parts I can use something else than ":", but I'd also like to use something else than "_", because "_" is already used in the readable custom name part of the real-life variables and also some suffixes eg. 0QFB15CP001_Dr_DrLuAnl_U_WL (readable part is 'Dr_DrLuAnl' and suffix is 'UWL'). If I would replace ":" by "_", it will be really hard to understand the seperation of those name parts for the user. As a consequence it will also be hard to find the related real-life value from PCS to FIWARE data or vice versa, because of unclear seperation in KKS name parts and exchange of some characters from original name. That is why I'd like to have two types of characters like ":" and "".

As a possible workaround I could use "_" like suggested and include the original name in the attribute description so that users can lookup the real-life relation of FIWARE and PCS names in swagger-documentation of our data model. I am thinking about alternatives, but I guess that this is the best option we have in the moment, because there are no other characters allowed than "_". As we generate model.yaml file for ~3.000 entities anyway and can re-generate it at any time, this would just be a small code change for us.

"For more complex scenarios, additional @context generation tools can be found on the Smart Data Models website."

As I tried to express in my last comment I can't find appropriate tools on smart data model page that would support to generate an @context-file from model.yaml. From my understanding there is the excel for simple data models and some tools that don't use schema.yaml as a base, but take another approach eg. based on CSV file and web form. These approaches are not suitable for big data models as you would need ages to enter all ~50 entites and ~3.000 variables manually into excel or web form.

To summarize: I think I will use the workaround with "_" \& include reference to original tag name in attribute description to support identification of the relation to real-life PCS tags. With that I can stick to generating my @context-file with context-file-generator based on our generated model.yaml file.

Greetings Inga

jason-fox commented 5 months ago

The KKS name scheme has three parts which are seperated by "/" and "."

A more consistent mechanism would be to URL Encode if possible:

0QFB15CP001:Dr_DrLuAnl:U -> 0QFB15CP001%3ADr_DrLuAnl%3AU and see if that works.

jason-fox commented 5 months ago

On further investigation, all of the following could be problematic for JSON-LD - colon :, space ` and slash/`

IngMiad commented 5 months ago

Hello Jason,

my solution will be to use only "_" and include the related name from the real-world system in the property description. My first tests have been successful, means that I can generate the (now simpler) @context-file using the context-file-generator successfully and Mintaka doesn't complain anymore about invalid @context file (as it is valid now).

Thank you very much for your help and suggestions! This was really valuable for us :)