Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
5 stars 1 forks source link

mapping from `md, dm` to `dct, dcat, dcat-cim, prov` #122

Open VladimirAlexiev opened 3 weeks ago

VladimirAlexiev commented 3 weeks ago

The following sections describe how conversion from CIMXML to CIMJSON-LD (JSON-LD and Trig) should be handled.

The convention between CIMXML - CIMJSON-LD - CIMXML is not pure syntax. There will be a mix between syntax and semantic translation. The goal is that the RDF structure under CIM JSON-LD will be a superset of the CIMXML structure and equivalent to the structure that next edition of CIMXML will support.

sequenceDiagram
    participant A as CIM XML (552:ED2)
    participant B as CIM RDF Structure
    participant C as CIM JSON-LD (553:ED1)
    participant D as CIM XML (552:ED3)

    A ->> B: CIMXML-2-CIMJSONLD  (Yellow Arrow)
    B ->> A: CIMJSONLD-2-CIMXML (Yellow Arrows)
    C ->> B: CIMJSONLD  (Green Arrow)
    B ->> C: CIMJSONLD (Green Arrows)
    D ->> B: CIMXML  (Green Arrow)
    B ->> D: CIMXML (Green Arrows)

Yellow Arrow is semantic transformation. Green Arrow is syntax transformation.

image

The current header information md:Model is not sufficient for the future. Rather then develop our own we would like to reuse DCAT.

CIM XML namespaces (e.g. md,dm ) will be converted to Trig/JSONLD dct, dcat, prov according to the following table:

CIMXML Operation CIMJSON-LD Note
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" Syntax "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cim="http://iec.ch/TC57/CIM100#" owl:sameAs "cim"= "https://cim.ucaiug.io/ns#" CIM17/CIM100 support both the new and the old namespace. The new namespace is persistent and points to the current version
xmlns:eu="http://iec.ch/TC57/CIM100-European#" owl:sameAs "eu": "https://cim.ucaiug.io/ns/eu#" CIM17/CIM100 support both the new and the old namespace. The new namespace is persistent and points to the current version
xmlns:md="http://iec.ch/TC57/61970-552/ModelDescription/1#" xmlns:dm="http://iec.ch/TC57/61970-552/DifferenceModel/1#" Semantic "dcterms": "http://purl.org/dc/terms/", "dcat": "http://www.w3.org/ns/dcat#", "dcat-cim": " https://cim4.eu/ns/dcat-cim#", "prov": "http://www.w3.org/ns/prov#", "adm": "http://www.w3.org/ns/adms#", "xsd": "http://www.w3.org/2001/XMLSchema#" The header information shall no longer be linked to the serialization. It will follow the DCAT-3 practice for the header data to match what would be in the catalog

CGMES Network Code (NC) instance data will follow the same namespace that will be used for the CIMJSON-LD. However, we need to defined a machine-understandable structure and syntax to explain the mapping between the namespaces. In addition, we would like that the same Network Code instance file should work with CGMES 2.4 (CIM16) and CGMES 3.0 (17). See #123

Header and dataset

Conversion rule CIMXML -> CIMJSON-LD

This should be a lossless transformation where some data are duplicated:

CIMXML Operation CIMJSON-LD Note
rdf:about Syntax @id
md:Model.created Semantic prov:generatedAtTime prov:generatedAtTime is really referring to the distribution. Plan is that would be allocated to the provenance. Need to declare as DateTime.
md:Model.created Semantic dcterms:issued Represent when the dataset (graph) is created. Need to declare as Date.
md:Model.scenarioTime Semantic dcterms:temporal Need to declare as dcterms:PeriodOfTime and xsd:dateTime. For EQ, DY, GL, this would be an open interval.
md:Model.scenarioTime Semantic dcat:temporalResolution Need to declare as Duration. Apply to SSH, TP and SV to state that the dataset includes hourly values, "PT1H".
md:Model.description Semantic dcterms:description Default language would be en-gb, but should handle multiple languages.
md:Model.modelingAuthoritySet Semantic dcat:isVersionOf Represent the abstract dataset.
md:Model.profile Semantic dcterms:conformsTo Needs to be declared as multiple.
md:Model.version Syntax dcat:version dcat:version and md:Model.version have different meanings. However, it is only possible to do a syntax transformation. md:Model.version is an integer that would be the major version of dcat:version.
md:Model.DependentOn Semantic dcterms:references Needs to be declared as multiple. Should include the title for human readability.
md:Model.Supersedes Semantic prov:wasRevisionOf
dm:reverseDifferences Semantic dcat-cim:reverseDifferenceSet
dm:forwardDifferences Semantic dcat-cim:forwardDifferenceSet

Conversion rule CIMJSON-LD -> CIMXML

This is not a lossless transformation and it includes duplication.

CIMJSON-LD Operation CIMXML Note
@id Syntax rdf:about
prov:generatedAtTime Syntax
dcterms:issued Semantic md:Model.created Represent when the dataset (graph) is created.
dcterms:temporal - dcat:startDate Semantic md:Model.scenarioTime
dcterms:description Semantic md:Model.description Only "@language": "en" or dcterms:description without language tag.
dcat:isVersionOf Semantic md:Model.modelingAuthoritySet
dcterms:conformsTo Semantic md:Model.profile Note that this can be multiple.
dcat:version Semantic md:Model.version Use Zero-Padded Strings, concatenate the dcat:version components as a zero-padded string to a fixed width, f"{major:03}{minor:03}{patch:05}".
dcterms:references Semantic md:Model.DependentOn Note that this can be multiple.
dcterms:title Omitted
dcterms:publisher Omitted
dcterms:rights Omitted
dcterms:rightsHolder Omitted
dcterms:license Omitted
dcterms:accessRights Omitted
dcat:keyword Omitted
dcterms:spatial Omitted
dcat:temporalResolution Omitted
prov:wasRevisionOf Semantic md:Model.Supersedes
dcat-cim:reverseDifferenceSet Semantic dm:reverseDifferences
dcat-cim:forwardDifferenceSet Semantic dm:forwardDifferences

The Python code: Code Scripts/Python/cim-convert-tool was created on an earlier version of this specification. However, it only converted one way.

VladimirAlexiev commented 1 week ago

@Sveino thanks for the mapping table!

Comments on specific rows:

@griddigit-ci what do you think?

Sveino commented 1 week ago

@Sveino thanks for the mapping table!

  • You cannot say xmlns:eu="http://iec.ch/TC57/CIM100-European#" owl:sameAs "eu": "https://cim.ucaiug.io/ns/eu#". owl:sameAs has a strict meaning and applies to instance URLs (resources) only

I have no problem to use something else to describe that the two namespaces are the same. Me and ChatGPT agrees :-)

  • I think you should have only 1 mapping table (with gaps on the CIMXML side). Else there is a risk of inconsistency.

The mapping CIMXML -> CIMJSON-LD is not symmetric with CIMJSON-LD -> CIMXML. The first is lossless, the other is not. Comments on specific rows:

  • it's unclear from the description whether prov:generatedAtTime and dcterms:issued are the same. If the difference is only that one is DateTime and the other Date, there is no need for such difference: dcterms:issued can be DateTime

They are semantic different. prov:generatedAtTime is the time the distribution was created and dcterms:issued is the dataset.

  • dcterms:temporal "Need to declare as dcterms:PeriodOfTime": This is only needed if you want to specify start/end dates, else a simple date is enough. Yes, and we need this is needed for some instant dataset.
  • dcterms:temporalResolution: the correct namespace is dcat

Thanks. I have update the text.

  • dcterms:temporalResolution "hourly values, "PT1H". Does that mean you also want to set end = start+1H?

For the current SSH, TP and SV we expect the converted to CIMJSON-LD to include:

    "dcterms:temporal": {
        "@type": "dcterms:PeriodOfTime",
        "dcat:startDate": {
            "@value": "2021-04-28T22:00:00Z",
            "@type": "xsd:dateTime"
        },
        "dcat:endDate": {
            "@value": "2021-04-28T23:00:00Z",
            "@type": "xsd:dateTime"
        }
    },
    "dcat:temporalResolution": {
        "@type": "xsd:duration",
        "@value": "PT1H"
    }, 

Each file has 1 hour validity with 1 hour resolution. It is exchange in UTC and for the first two hours it will use the previous day to exchange CET Daylight saving time.

  • IMHO such profile-specific logic will be brittle if we add it in a conversion script, and should be handled by the producing application.

Yes, this needs to be handled by the producing application. However, as part of transition we would need to do conversion.

  • SSH, TP and SV: are these the only profiles covering 1H, and all others cover an open-ended period?
    • dcat:isVersionOf: the correct namespace is dct

Not according to DCAT-3: https://www.w3.org/TR/vocab-dcat-3/#ex-version-chain-and-hierarchy And not according to our specification Metadata Dataset Distribution Specification image

  • "md:Model.modelingAuthoritySet - dcat:isVersionOf: Represent the abstract dataset": I disagree with this idea. MAS is the namespace of resources (URLs) and includes no triples, so IMHO it's not a model in any sense of the world. Consider using jsonld:base (see https://www.w3.org/ns/json-ld#base or https://www.w3.org/ns/json-ld.ttl)

It can represent both. I will add the spec on base. We have a lot of dialogs on this.

  • dcterms:references: This is an unspecific prop akin to a bibliographic citation. Consider using dct:requires

I see that you are using dct, but DCAT-3 is using dcterms that should be according to the Namespace Policy for the Dublin Core™ Metadata Initiative (DCMI) Yes, I like dcterms:requires better than dcterms:references. However, DCAT-3 is kind of indication that one should use dcat:qualifiedRelation with dcat:hadRole. For this start to mix too much of provenance. It is not included in EU DCAT-AP (3.0.0) I will open an issue on this.

  • prov:wasRevisionOf: you don't need to involve the PROV ontology for just 1 prop. Consider using dct:replaces

But it does not replace it is a new version with another validity period. This is a crucial difference that now is supported by DCAT-3. We will also use dcterms:replace for the cases that dataset is incorrect and are no longer valid.

  • dcat-cim:reversalDifferenceSet: consider using the old name dcat-cim:reverseDifferences. The plural already says it's a collection of differences. And the word "reversal" doesn't match the word "forward". Using a new terminology will just confuse people.

dcat-cim:reversalDifferenceSet is incorrect it should have been dcat-cim:reverseDifferenceSet. I and the CIM community are not so happy to use "s" for plural due to all the exception of the plural form (e.g. irregular plurals man-> men, ending in "-y" city -> cities. So we do not want to use plural form. everything is in singular. Second the "Set" refers to Dataset. But I have no strong opinion for keeping the "Set"

@griddigit-ci what do you think?