Open VladimirAlexiev opened 2 months ago
Eliminating duplication would require proper modularization, i.e. the creation of more ontology files: CIM datatypes, CIM core, CGMES core, etc.
Packages
already have this separationCimSyntaxGen
: can't be done in SPARQLlongerm
and assign to @Sveino for considerationWhat's the problem:
in the description: maybe this is not a problem at all?Here we need to see how RDFS are using a subset of datatypes vocabulary. Now it is duplicated as we are lacking of linking mechanisms and vendors were preferring things to be self contained
Linking is done by owl:imports
.
But if the definitions are identical and the ontologies won't be loaded in named graphs, then the duplication does no harm.
From @Sveino's presentation DX-PROF Balance vs Unbalance.pptx
:
We discussed the idea that instead of 20 AP ontologies that define terms multiple times, we can have 40 ontologies that define each term once, Core, Wires...
etc reused (imported) by EQ, EQBD
etc.
This should happen in CIM18 using CimContextor (for vocabulary profiling).
BTW I notice a little redundancy here (eg from 61970-600-2_Topology-AP-Voc-RDFS2020_v3-0-0.rdf):
<rdfs:subClassOf>
<rdfs:Class rdf:about="http://iec.ch/TC57/CIM100#ACDCTerminal"/>
</rdfs:subClassOf>
This not only refers to ACDCTerminal
, but also specifies its RDF type (which is repeated in the full description of that class).
Because every referenced class is redundantly defined in each ontology and formatted turtle eliminates duplicate triples, this smaller redundancy is not seen in turtle.
Other references don't have such redundancy, eg:
<rdfs:range rdf:resource="http://iec.ch/TC57/CIM100#ConnectivityNode"/>
We agreed 2 weeks ago that "ontology modularization" is needed:
@Sveino @griddigit-ci right?
Yes, this is the way I would like to have it for 61970-501:ED2
. But I am not sure if we need to have this done before we can finalize the work.
Common terms are duplicated many times between ontologies. See detailed analysis in rdfs-improvement/README:
And the files mentioned below. So these are just the counts:
The problem is pervasive: 12% of terms are duplicated (875 out of 7268). The most "popular" terms are duplicated 28 times:
It's not only about primitives and other meta-terms. Electrical terms are also duplicated, eg:
What's the problem:
RDFSEd2Beta
style but NC usingRDFS2020
style and will be fixed by https://github.com/Sveino/Inst4CIM-KG/issues/41