Ordered keys of references in RDF

kenwenzel commented 3 years ago

The RDF representation does currently not support a fixed order for the keys of a reference. The property https://admin-shell.io/aas/3/0/RC01/Reference/key of the class Reference has the following range:

https://github.com/admin-shell-io/aas-specs/blob/d1a1cec327de43449f04d6948c9cea3836effc31/schemas/rdf/rdf-ontology.ttl#L1779

This is similar to the issue #42

Possible solutions are:

use rdf:Lists
use an additional property like aas:precedes on the referenced keys and compute a topological ordering
use a wrapper object like KeyWithIndex with an additional aas:index property that can be used for ordering
use an addition property like aas:index directly with the class Key instead of introducing a wrapper class

kenwenzel commented 3 years ago

OK, I found something in the following specification document:

https://www.plattform-i40.de/PI40/Redaktion/EN/Downloads/Publikation/Details_of_the_Asset_Administration_Shell_Part1_V3.pdf

Page 142 describes the following mapping rule:

(6) Keys must have an index attribute.
Keys of a Reference have a defined order, however RDF is explicitly set-based. The index attribute encodes
the position in the original sequence. Consequently, Keys belonging to one Reference must have unique
numbers in the range [0..keyCount], ascending from 0. If only one Key is supplied, the index attribute can also
be skipped, implying a default value of ‘0’.

The mentioned index attribute needs to be included in the AAS ontology. I would suggest to define a global property aas:index that can also be used for ordered SubModelElements.

kenwenzel commented 2 years ago

This is still an issue for draft-V3RC02. Should I prepare a PR?

Edit by @mristin: fixed a typo

mristin commented 2 years ago

@sebbader Please have another look at this issue. Do we need to fix anything for V3RC02 or should we postpone this for future versions?

sebbader commented 2 years ago

@mristin this has a few additional implications, let's regard this later. We are also currently working on a way more RDF-native interpretation/profile in addition to the official RDF schema, which will then also help us with this issue.

mristin commented 2 years ago

I'll close the issue for now. Please remember to reference it in the future if it is ever resolved.

arnoweiss commented 11 months ago

This is still open and will lead to the RDF-serialization to cause data loss when parsing & serializing

mhrimaz commented 11 months ago

The easiest approach to solve this is to use an additional property like aas:index as @kenwenzel suggested. Furthermore, if we want to have round-trips, it is better to have index for all arrays, otherwise, for example the order of submodels might change when we serializer json to rdf and then to json.

arnoweiss commented 11 months ago

Yes, that would be required, however the generator would have to make sure that the aas:index property does not get serialized to json & xml as it's not part of the meta-model. Using the rdf:first and rdf:rest mechanism could also work but is obvs quite ugly.

mhrimaz commented 11 months ago

@arnoweiss I implemented a full-parser (metamodel v3 compliant) that uses aas:index to generate RDF representation. You can find it here (endpoint: /submodel:jsontordf , ...) or test it live . Reading RDF and not including index in serialization isn't technically hard, the only thing that you should take care of is when you interact with the RDF (in a triplestore) and for example you change the order of elements, then you should update all corresponding indexes.

mristin commented 11 months ago

@sebbader-sap can you please instruct me once the discussion settles what we need to change in aas-core-codegen?

I'm re-opening the issue as it seems relevant.

kenwenzel commented 11 months ago

Maybe this is a bit off-topic (as it is discussed within the IDTA ontologies workstream) but we should consider the serialization of the individual keys in RDF:

Taking this example https://github.com/admin-shell-io/aas-specs/blob/master/schemas/json/examples/generated/Reference/for_a_model_reference_valid_key_value_after_submodel_element_list.json we have the following reference:

{
    "idShort": "someElement",
    "modelType": "ReferenceElement",
    "value": {
      "keys": [
        {
          "type": "Submodel",
          "value": "https://example.com/something"
        },
        {
          "type": "SubmodelElementList",
          "value": "something_more"
        },
        {
          "type": "Property",
          "value": "123"
        }
      ],
      "type": "ModelReference"
    }
}

If we directly translate the keys to RDF IRIs we could use a serialization as the following:

[] a aas:ModelReference ; aas:keys [
    rdf:_1 <urn:aas:Submodel:https://example.com/something> ;
    rdf:_2 <urn:aas:SubmodelElementList:https://example.com/something#something_more> ;
    rdf:_3 <urn:aas:Property:https://example.com/something#something_more.123> ] .

Note: In reality the elements after the last colon would be BASE64-encoded.

This would somehow interfere with adding an index property directly to the keys.

Also related to issue #166

mhrimaz commented 11 months ago

@kenwenzel Having a direct link is also really helpful and makes the queries more efficient and natural. My decision was not to break current presentation as much as possible.

My serializer returns the following for your input (Good to see that why index is necessary, RDFlib can easily re-order elements in serialization):

@prefix aas: <https://admin-shell.io/aas/3/0/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<TXlTdWJtb2RlbA> a aas:Submodel ;
    <https://admin-shell.io/aas/3/0/Identifiable/id> "MySubmodel" ;
    <https://admin-shell.io/aas/3/0/Submodel/submodelElements> <TXlTdWJtb2RlbA/submodel-elements/someElement> .

<TXlTdWJtb2RlbA/submodel-elements/someElement> a aas:ReferenceElement ;
    <https://admin-shell.io/aas/3/0/Referable/idShort> "someElement" ;
    <https://admin-shell.io/aas/3/0/ReferenceElement/value> [ a aas:Reference ;
            <https://admin-shell.io/aas/3/0/Reference/keys> [ a aas:Key ;
                    <https://admin-shell.io/aas/3/0/Key/type> <https://admin-shell.io/aas/3/0/KeyTypes/SubmodelElementList> ;
                    <https://admin-shell.io/aas/3/0/Key/value> "something_more" ;
                    aas:index 1 ],
                [ a aas:Key ;
                    <https://admin-shell.io/aas/3/0/Key/type> <https://admin-shell.io/aas/3/0/KeyTypes/Property> ;
                    <https://admin-shell.io/aas/3/0/Key/value> "123" ;
                    aas:index 2 ],
                [ a aas:Key ;
                    <https://admin-shell.io/aas/3/0/Key/type> <https://admin-shell.io/aas/3/0/KeyTypes/Submodel> ;
                    <https://admin-shell.io/aas/3/0/Key/value> "https://example.com/something" ;
                    aas:index 0 ] ;
            <https://admin-shell.io/aas/3/0/Reference/type> <https://admin-shell.io/aas/3/0/ReferenceTypes/ModelReference> ] ;
    aas:index 0 .

Regarding your serialization, I guess it would be redundant to have all intermediate link. But with a post-processing tool (it can either be part of serializer, or part of a SPARQL query) you can construct those nice-to-have link.

I think it would be necessary to agree on a way to construct IRIs of properties: For identifiables we can use base64url encoded id as it is for the example submodel here: <TXlTdWJtb2RlbA> a aas:Submodel ; other elements that are identified locally by idShort we can construct the similar rest api path for exmaple: <{SUBMODEL_BASE64URL}/submodel-elements/{ID_SHORT}/ / / / >....

So here your reference is referring to submodel https://example.com/something which is aHR0cHM6Ly9leGFtcGxlLmNvbS9zb21ldGhpbmc. Your post processing tool can then combine the keys and construct the following: [] aas:refersTo <aHR0cHM6Ly9leGFtcGxlLmNvbS9zb21ldGhpbmc/submodel-elements/something_more#112>.... This would only be possible if we agree on the structure of IRIs.

Post-Processing with SPARQL: The following query will match all aas and submodels and create a link, but it uses an additional property aas:hasSubmodel. This makes the navigation easier and more efficient. If the submodel exist in local repository then the natural link will be constructed, otherwise it will remain as it is.

PREFIX aas: <https://admin-shell.io/aas/3/0/>
INSERT {
    ?aas aas:hasSubmodel ?submodel .
    ?submodel aas:hasShell ?aas
}
Where { 
    ?aas a aas:AssetAdministrationShell .
    ?aas <https://admin-shell.io/aas/3/0/AssetAdministrationShell/submodels> ?a_submodel_ref .
    ?a_submodel_ref aas:Reference\/keys ?keys .
    ?keys aas:Key\/value ?ref_value .
    ?submodel a aas:Submodel .
    ?submodel aas:Identifiable\/id ?submodel_id .

    Filter(?ref_value = ?submodel_id)
}

ps: I guess the last element should be a fragment reference to refer to an index in submodel elemen list.

mjacoby commented 9 months ago

So this issue has been around for about 2 1/2 years now and there is still no solution in sight. As it is right now, RDF serialization is not possible because of this. I find this unacceptable!

Let's start with making RDF at least usable, even if it may not be the best of all solutions and then go from there and if required update the RDF serialization in the future. The obvious step forward is the one proposed by @kenwenzel at the very beginning of this discussion as it is in line with the specification. In fact, it seems like things were intended to be that way and someone only missed to define aas:index in the ontology. So let's add this as soon as possible that we have a working version and then continue the discussion for a "better" solution for as long as it takes.

@kenwenzel @mhrimaz @arnoweiss Is this fine with you? @mristin @sebbader Any arguments why we should not do this? If not, can you make it happen that this will be included in the very next release (3.0.2?)?

mristin commented 9 months ago

@mjacoby I am not versed in RDF, so I have no opinion.

However, whatever the solution is, we need to implement it for all lists, not only for keys in a reference. This is the limitation of aas-core-codegen which treats Reference like any other class.

For example, if I understand the suggestion by @kenwenzel correctly, we would add aas:index to all lists.

mjacoby commented 9 months ago

This is only required for lists where the ordering is important. Therefore, I am not 100% sure which classes are affected besides Reference and SubmodelElementList.

I assumed that for SubmodelElementList this has been resolved as https://github.com/admin-shell-io/aas-specs/issues/42 has been closed, however, after some investigation, I get the impression that this issue has been closed because it only explicitly mentioned SubmodelElementCollection in the title which is no longer relevant (because the orderRelevant property has been removed) although the underlying issue is the same for SubmodelElementList.

So it seems you are correct and this issue is relevant for at least Reference and SubmodelElementList - if anyone is aware of another place where this problem arises please leave a comment here.

I am also not an expert on RDF but still quite familiar with using it so it seems like a reasonable approach to use aas:index for all lists that need to be ordered.

Can we agree that this is a viable solution even though it might not proof to be the "perfect" one but at least we have a working one for now? Would ne nice to get feedback from @kenwenzel and @mhrimaz as you probably have the most detailed knowledge about this issue

mristin commented 9 months ago

This is only required for lists where the ordering is important.

Sorry, I probably wasn't clear. The aas-core-codegen sees only lists, and the order is relevant for every list. Hence, we either add aas:index to all lists or to none.

There are many lists in the meta-model, see: https://github.com/aas-core-works/aas-core-meta/blob/main/aas_core_meta/v3.py

... and search for List[.

kenwenzel commented 9 months ago

Either we go with aas:index or we use something like rdf:first and rdf:next to create ordered linked lists. The second approach may be easier for querying while the first one is more close to the standard.

mjacoby commented 9 months ago

Are you implying that we add unnecessary or even "wrong" information like indices for elements in a set that have no order according to the specification to an RDF serialization because some tool is treating lists differently than defined in the specification (assuming an order where there is none)? @mristin

mristin commented 9 months ago

Are you implying that we add unnecessary or even "wrong" information like indices for elements in a set that have no order according to the specification to an RDF serialization because some tool is treating lists differently than defined in the specification (assuming an order where there is none)? @mristin

Indeed, that is the limitation of the tool.

Also, you probably want the round-trip XML <-> RDF and JSON <-> RDF to hold. If the lists are unordered, the round trips do not hold anymore.

mjacoby commented 9 months ago

The correct way would be to adjust the tool to respect the specification not write the specification to match what some tool is doing. To my understanding, and in accordance to two implementations of the JSON serialization, there is no ordering of all lists in JSON.

For example, assume a Submodel containing the Property elements with idShort x and y. Both of the following would be equally valid serializations representing the exact same submodel as there is no ordering defined for the value property of class Submodel.

{
    "modelType":
    {
        "name": "Submodel"
    },  
    "idShort": "ExampleSubmodel",
    "submodelElements": [
        {
            "modelType":
            {
                "name": "Property"
            },          
            "value": "value of x",
            "valueType": "xs:string",
            "idShort": "x",         
        },
        {
            "modelType":
            {
                "name": "Property"
            },          
            "value": "value of x",
            "valueType": "xs:string",
            "idShort": "y",         
        }
    ]
}

{
    "modelType":
    {
        "name": "Submodel"
    },  
    "idShort": "ExampleSubmodel",
    "submodelElements": [
        {
            "modelType":
            {
                "name": "Property"
            },          
            "value": "value of x",
            "valueType": "xs:string",
            "idShort": "y",         
        },
        {
            "modelType":
            {
                "name": "Property"
            },          
            "value": "value of x",
            "valueType": "xs:string",
            "idShort": "x",         
        }
    ]
}

If I am not mistake, this is the same for XML. Therefore, I see no problem with round-trips and no reason why we should enforce such an ordering in RDF.

Am I missing something? If there is an ordering defined, please point me to the corresponding section in the specification.

mhrimaz commented 9 months ago

As @kenwenzel already mentioned, any option is possible and in my opinion there is no best option. If anyone is completely against aas:index they can check FHIR RDF model that uses the exact element as fhir:index. And from the de-/serializer implementation perspective, at least from my side, where I directly work with RDFlib or Apache Jena, there is no issue. Maybe on the code generator side, some options would be easier to work with.

As an example, the current implementation in AAS4j uses aas:index for all elements and not just Key. Doing so will ensure rountrips between java object and RDF. However, if the logic of equals does not care about the ordering of other Lists, then I can simply remove it. For example, I didn't do that for Environment.

Here is an example for environment that doesn't hold any information about indexes. The consequence is that if I do round trip (java object <-> rdf) each time I have different ordering. As a result, if you just compare the textual representation, then you will have textual changes which is the example you mentioned with the JSON. So we consider those JSONs equal, but the database, or version control system, or... will not see them similar. That's also the case for equals logic in aas4j. Because now the ordering actually is important and in this case I can't use equals and should just compare the number of elements.

[ a       <https://admin-shell.io/aas/3/0/Environment>;
  <https://admin-shell.io/aas/3/0/Environment/assetAdministrationShells>
          <http://customer.com/aas/9175_7013_7091_9168>;
  <https://admin-shell.io/aas/3/0/Environment/conceptDescriptions>
          <http://www.vdi2770.com/blatt1/Entwurf/Okt18/cd/StoredDocumentRepresentation/DigitalFile> , <http://www.vdi2770.com/blatt1/Entwurf/Okt18/cd/Document> , <http://customer.com/cd/1/1/18EBD56F6B43D895> , <0173-1#02-BAA120#008> , <http://www.vdi2770.com/blatt1/Entwurf/Okt18/cd/Description/Title>;
  <https://admin-shell.io/aas/3/0/Environment/submodels>
          <http://i40.customer.com/type/1/1/1A7B62B529F19152> , <http://i40.customer.com/instance/1/1/AC69B1CB44F07935> , <http://i40.customer.com/type/1/1/7A7104BDAB57E184>
] .

Environment object = environment;
RDFSerializationResult rdfSerializationResult = new DefaultEnvironmentRDFHandler().toModel(object);
Model model = rdfSerializationResult.getModel();
model.write(System.out, Lang.TTL.getName());
Resource createdResource = rdfSerializationResult.getResource();
assert rdfSerializationResult.getModel().contains(createdResource, RDF.type, AASNamespace.Types.Environment);

Environment recreatedObject = new DefaultEnvironmentRDFHandler().fromModel(model, createdResource);
assertEquals(object.getConceptDescriptions().size(),
        recreatedObject.getConceptDescriptions().size());
assertEquals(object.getSubmodels().size(),
        recreatedObject.getSubmodels().size());
assertEquals(object.getAssetAdministrationShells().size(),
        recreatedObject.getAssetAdministrationShells().size());

mristin commented 9 months ago

@mjacoby wrote:

The correct way would be to adjust the tool to respect the specification not write the specification to match what some tool is doing.

I see basically four ways forward if the indices are only to be added to some of the lists: 1) Manually patch the RDF schema after it has been generated with aas-core-codegen, 2) Write a new script for RDF&SHACL generation which uses aas-core-codegen as dependency for the intermediate representation, 3) Write a script that reads the generated RDF and then adds indices according to some pre-defined rules, or 4) Write everything by hand.

So far, we always had problems to find maintainers willing to do this work. Maybe there are now more people happy to get involved.

mjacoby commented 9 months ago

I get what your saying and understand the problem @mristin, but I hope you agree that being sloppy, or better deliberately designing a specification in a nonsensical way (like adding an order to elements that are not considered ordered), because it is easier to do with the limited resources we have is never a good idea for standardization. In fact, if we are lacking the resources to do things correctly on no-one is willing to contribute than I would argue the standard is not relevant and already dead. I therefore strongly urge to no cut corners here (and everywhere else), at least not to the point it has such technical/logical implications.

I do not know any details about aas-core-codegen, but wouldn't it be possible to change the tool itself and introduce two types of collections - ordered and unordered? If you have this you could define how these are mapped onto different formats, e.g. in JSON and XML you treat them exactly as you do now (i.e. no changes required) and only for RDF you treat them differently.

In the end, these decision how to model things and the creation of the models/schemas are to separate issues as the former is relevant for all users of the specification and the latter only for the ones who are creating the specification. I suggest agreeing on what to do first and then we will see how to do it and how long it might take.

mristin commented 9 months ago

I do not know any details about aas-core-codegen, but wouldn't it be possible to change the tool itself and introduce two types of collections - ordered and unordered? If you have this you could define how these are mapped onto different formats, e.g. in JSON and XML you treat them exactly as you do now (i.e. no changes required) and only for RDF you treat them differently.

That is of course possible, but takes probably an effort on the order of months. I would gladly volunteer to mentor the developer for that change, but I don't have time for that change myself.

mjacoby commented 9 months ago

I understand. But let's start by agreeing on how to solve this issue content-wise and then think about the timeline. Do we agree to introduce aas:index only for the child elements of Reference and SubmodelElementList?

VladimirAlexiev commented 8 months ago

I also like the aas:index approach, especially because there's already a node (aas:Key) where it can be added. In addition to fhir:index, I can give schema:ListItem and schema:position as examples.

However, it would be best if that index is also present in JSON. Otherwise you cannot treat this JSON as JSON-LD to make RDF, but you need to read the JSON with special software and generate aas:index while reading.

Then for symmetry, it may be best if it's also in XML.

It seems that the AAS specifiers forgot a basic property of RDF: that it doesn't support ordered elements natively :-(

kenwenzel commented 8 months ago

@VladimirAlexiev Good point to make the JSON format compatible with JSON-LD. If this is a requirement then maybe we should use lists instead of the index property? https://www.w3.org/TR/json-ld11/#representing-lists-as-arrays

admin-shell-io / aas-specs

Ordered keys of references in RDF #45