Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
5 stars 1 forks source link

shorten prop names in JSONLD? #100

Open VladimirAlexiev opened 1 month ago

VladimirAlexiev commented 1 month ago

In JSONLD, should we shorten prop names like this:

{
  id: foo,
  type: ACDCConverterController,
  ACDCConverter: bar
},
{
  id: baz,
  type: ACDCConverterAction,
  ACDCConverter: bar
},
{
  id: bar,
  type: ACDCConverter,
  ACDCConverterController: foo,
  ACDCConverterAction: baz
}

Or keep them long like that:

{
  id: foo,
  type: ACDCConverterController,
  ACDCConverterController.ACDCConverter: bar
},
{
  id: baz,
  type: ACDCConverterAction,
  ACDCConverterAction.ACDCConverter: bar
},
{
  id: bar,
  type: ACDCConverter,
  ACDCConverter.ACDCConverterController: foo,
  ACDCConverter.ACDCConverterAction: baz
}

At a Friday meeting it was confirmed that the meaning (and kind) of props that have the same last part is always the same. But for the same last part, CIM ontologies make multiple props, depending on the hosting class.

This query shows 762 prop groups that have the same last part, but different first (class) part:

PREFIX afn: <http://jena.apache.org/ARQ/function#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select ?lastPart (group_concat(?firstPart) as ?classes) (count(*) as ?c) {
  values ?kind {owl:DatatypeProperty owl:ObjectProperty}
  ?p a ?kind
  bind(afn:localname(?p) as ?localname)
  bind(replace(?localname,"(.*)\\.(.*)","$1.*") as ?firstPart)
  bind(replace(?localname,"(.*)\\.(.*)","$2") as ?lastPart)
} group by ?lastPart having (?c>1) order by ?lastPart

image

The worst "offenders" are: image

We cannot fix this in CIM RDF (ontologies and SPARQL) because the change will be too disruptive.

VladimirAlexiev commented 1 month ago

Hi @tviegut, what do you think? It would be good if prop naming matches your draft 62361-104 CIM Profiles to JSON schema Mapping Rev 01v20 – 2021-07-06. I think it uses shortened prop names because Annex D gives examples when straight shortening cannot be used (I'm not sure quite understand that part)

admin-cimug commented 1 month ago

Hi @tviegut, what do you think?

It would be good if prop naming matches your draft 62361-104 CIM Profiles to JSON schema Mapping Rev 01v20 – 2021-07-06.

I think it uses shortened prop names because Annex D gives examples when straight shortening cannot be used (I'm not sure quite understand that part)

@VladimirAlexiev , first I'm sorry for the delay. Was in Thailand last week and then playing catch up this week. So sounding off this morning on this topic. I would agree and am of the same opinion to use shortened names unless I miss an important reason to do otherwise. It would be good to hear if @Sveino or @griddigit-ci have some thoughts on this and agree as well.

Todd

griddigit-ci commented 3 weeks ago

I like to go more in W3C-ish way and see if we can get rid of the "." notation. But we need to see also what artifacts are needed to secure transition. In the JSONLD serialization ideally we will have the type and just the property name, without the class name We know that we cannot ensure unique properties names at this stage in CIM, but in the vocabulary and serialisation of the instance data we can do this if it is clear that the property belongs to some class

Vlado, please see how to cover the case where an instance of a class has a native attribute and an inherited attribute with the same name. I am not sure if we have that situation in the profiles, bit in JSONLD we need to ensure that we will not end up in mess . I guess this is the only reason why people put "." notation

admin-cimug commented 3 weeks ago

I like to go more in W3C-ish way and see if we can get rid of the "." notation. But we need to see also what artifacts are needed to secure transition.

In the JSONLD serialization ideally we will have the type and just the property name, without the class name

We know that we cannot ensure unique properties names at this stage in CIM, but in the vocabulary and serialisation of the instance data we can do this if it is clear that the property belongs to some class

Vlado, please see how to cover the case where an instance of a class has a native attribute and an inherited attribute with the same name. I am not sure if we have that situation in the profiles, bit in JSONLD we need to ensure that we will not end up in mess . I guess this is the only reason why people put "." notation

@griddigit-ci : Chavdar, we only have the following Rule 118 in the CIM Modeling Guide regarding duplicate names:

https://cim-mg.ucaiug.io/latest/section5-cim-uml-modeling-rules-and-recommendations/#inheritance-rules

So it is safe to assume you're not referring to duplication in this context correct? We obviously have plenty of duplicate attribute names like p & q in various classes but that shouldn't be an issue.

Penny for your thoughts.

Todd

Sveino commented 3 weeks ago

I think that first step is to validate that all the attribute/properies with the same name has the same notation/rdfs:comments.

VladimirAlexiev commented 1 week ago

This query finds differences between same-named props:

PREFIX afn: <http://jena.apache.org/ARQ/function#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?q ?p1 ?r1 ?p2 ?r2 {
  values ?kind1 {owl:DatatypeProperty owl:ObjectProperty} ?p1 a ?kind1. 
  values ?kind2 {owl:DatatypeProperty owl:ObjectProperty} ?p2 a ?kind2.
  bind(replace(str(?p1),".*\\.(.*)","$1") as ?lastPart1)
  bind(replace(str(?p2),".*\\.(.*)","$1") as ?lastPart2)
  filter(str(?p1)<str(?p2) && ?lastPart1=?lastPart2)
  ?p1 ?q ?r1.
  ?p2 ?q ?r2.
  filter(
    ?q != rdfs:domain &&
    !(?r1 in (owl:FunctionalProperty, owl:InverseFunctionalProperty)) &&
    !(?r2 in (owl:FunctionalProperty, owl:InverseFunctionalProperty)) &&
    ?r1 != ?r2)
}

The first 1k rows take 50s (saved as rdf-improved/props-same-name-different-characteristics.csv). Tried to download all rows but had to abort after 9 min.

There are various kinds of differences:

@Sveino But I'm not talking about merging same-named props. That change would be too disruptive for CIM RDF. I'm talking about using short names in JSON-LD. That requires using class-dependent contexts (to map the same JSON key to different prop URLs), (and the same could be used to declare single-valued vs multi-valued props, though it's not necessary in JSONLD). Comments and stereotypes are not present in JSON-LD instance data.

This specialized query looks for the most significant kind: different ranges:

PREFIX afn: <http://jena.apache.org/ARQ/function#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?p1 ?r1 ?p2 ?r2 {
  values ?kind1 {owl:DatatypeProperty owl:ObjectProperty} ?p1 a ?kind1. 
  values ?kind2 {owl:DatatypeProperty owl:ObjectProperty} ?p2 a ?kind2.
  bind(replace(str(?p1),".*\\.(.*)","$1") as ?lastPart1)
  bind(replace(str(?p2),".*\\.(.*)","$1") as ?lastPart2)
  filter(str(?p1)<str(?p2) && ?lastPart1=?lastPart2)
  ?p1 rdfs:range ?r1.
  ?p2 rdfs:range ?r2.
  filter(?r1 != ?r2)
}

It takes 2 minutes and finds 583 differences: props-same-name-different-range.csv. I'll post another task to examine it for errors.

VladimirAlexiev commented 1 week ago

Previously I did some analysis, see CIM Shorten Prop Names

admin-cimug commented 1 week ago

@VladimirAlexiev : This is a very useful writeup. After reading it through I find myself wanting to sit down to discuss through your findings and review a bit more in depth. Perhaps in the near future here we could set a call to do so.

Svein, do you think a useful plan? I know timing is important. I find myself with questions I'd love to discuss (and commenting here isn't the perfect forum).

Todd

cc: @Sveino , @griddigit-ci

VladimirAlexiev commented 1 week ago

@admin-cimug We'll have a whole-day online meeting on Nov 21, and can allocate 1h for this topic. @Sveino I assume I should prepare a presentation of say 1.5h about my work on different topics?

using class-dependent contexts (to map the same JSON key to different prop URLs),

I mean type-scoped contexts, see https://w3c.github.io/json-ld-syntax/#example-defining-an-context-within-a-term-definition-used-on-type .

So I think we should cancel this at the JSONLD level :-( @Sveino do you think it's feasible to do it at the RDF level, i.e. in CIM ontologies?