Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
1 stars 1 forks source link

Add Datatypes To Instance Data #49

Open VladimirAlexiev opened 1 week ago

VladimirAlexiev commented 1 week ago

In CGMES instance data, all literals are string, but should be marked with the appropriate datatype.

This query counts props by XSD datatype:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select ?range (count(*) as ?c) {
   ?x rdfs:range ?range
    filter(strstarts(str(?range), str(xsd:)))
} group by ?range order by ?range
Here are the current results, but it should be rerun after fixes to ontology: see col "comment" range c comment
xsd:boolean 218 Inflated because meta-data props are duplicated, and many are boolean
xsd:dateTime 5
xsd:decimal 1
xsd:float 310 Deflated because eg cim:ActivePower.value may be used by hundreds of "real" props
xsd:gMonthDay 2
xsd:integer 36
xsd:string 51

I have a tentative SPARQL Update, but need to revise it.

griddigit-ci commented 1 week ago

Need to discuss if we have concerns related to the file size. It will be very good if we have explicit datatypes in the instance data as this will not require post processing and mappings are parsing time to enable SHACL validation of datatypes. Is it common to assume the xsd:string so that we do not want to declare that one? How JSON-LD will deal with it? I think we can manage this with some sort of context so that we do not have repetitions in the serialisations.

VladimirAlexiev commented 6 days ago

@griddigit-ci @Sveino

About JSON-LD: #55