Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
5 stars 1 forks source link

Add Datatypes To Instance Data #49

Open VladimirAlexiev opened 2 months ago

VladimirAlexiev commented 2 months ago

In CGMES instance data, all literals are string, but should be marked with the appropriate datatype.

This query counts props by XSD datatype:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select ?range (count(*) as ?c) {
   ?x rdfs:range ?range
    filter(strstarts(str(?range), str(xsd:)))
} group by ?range order by ?range
Here are the current results, but it should be rerun after fixes to ontology: see col "comment" range c comment
xsd:boolean 218 Inflated because meta-data props are duplicated, and many are boolean
xsd:dateTime 5
xsd:decimal 1
xsd:float 310 Deflated because eg cim:ActivePower.value may be used by hundreds of "real" props
xsd:gMonthDay 2
xsd:integer 36
xsd:string 51

I have a tentative SPARQL Update, but need to revise it.

griddigit-ci commented 2 months ago

Need to discuss if we have concerns related to the file size. It will be very good if we have explicit datatypes in the instance data as this will not require post processing and mappings are parsing time to enable SHACL validation of datatypes. Is it common to assume the xsd:string so that we do not want to declare that one? How JSON-LD will deal with it? I think we can manage this with some sort of context so that we do not have repetitions in the serialisations.

VladimirAlexiev commented 2 months ago

@griddigit-ci @Sveino

About JSON-LD: #55

VladimirAlexiev commented 1 month ago

Done, see https://github.com/Sveino/Inst4CIM-KG/blob/develop/rdf-improved/fix-datatypes.ru

Sveino commented 4 weeks ago

We cannot really do anything on this for CIMXML, so this must be address as for of JSON-LD. This was discussed in https://github.com/3lbits/CIM4NoUtility/issues/278. I agree with Valdimir comment regarding sizing - the support for zip is an issue we need to discuss as well. The thinking for JSON-LD was that this information is derived from the profile, but it must be 1.23 and not "1.23" for a float. My understanding now is that when we import the CIM XML we will run the script above that will add the correct Datatype for the instance data?

VladimirAlexiev commented 3 weeks ago

We cannot really do anything on this for CIMXML

Why not? If we do, it'll just add attribute rdf:datatype to every value. Will that break any software? Or do you mean that you cannot demand it in the spec?

when we import the CIM XML we will run the script above that will add the correct Datatype for the instance data?

It can be used in two ways:

VladimirAlexiev commented 3 weeks ago

Reopening so you can decide whether it's unfeasible to use rdf:datatype in CIM XML.

Sveino commented 3 weeks ago

We cannot really do anything on this for CIMXML

Why not? If we do, it'll just add attribute rdf:datatype to every value. Will that break any software? Or do you mean that you cannot demand it in the spec?

We have to update the spec for CIM XML - which we are planning to do. But we do not have a approved specification for JSON-LD (however, we have indicated to the vendor how we are planning to do it)

when we import the CIM XML we will run the script above that will add the correct Datatype for the instance data?

It can be used in two ways:

With jena update (in-memory SPARQL Update) to add datatypes to a file. I'll use that when producing Trig and JSONLD In a semantic repo after loading CIM XML

We can also run a script of the instance fil to create a new instance file. At Statnett we also have a code for exporting CIM XML from GraphDB.