Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
5 stars 1 forks source link

Whitespace in Definitions #6

Closed VladimirAlexiev closed 2 months ago

VladimirAlexiev commented 2 months ago

https://github.com/Sveino/Inst4CIM-KG/tree/develop/rdfs-improved#whitespace-in-definitions

Many definitions include leading/trailing whitespace (newlines, tabs etc), eg:

cim:Boolean a owl:Class ;
  rdfs:label "Boolean"@en ;
  dl:Package "Package_DiagramLayoutProfile" ;
  dl:isPrimitive "True" ;
  skos:definition """
A type with the value space "true" and "false".

\t"""@en .

This query finds 1556 instances of leading/trailing whitespace in strings. (I guess some are duplicated between 2.3 and 3.0 CIM namespaces):

select * {
    ?x ?p ?label
    filter(regex(?label,"^\\s|\\s$"))
}

See literals-whitespace.tsv

This query counts by property:

select ?p (count(*) as ?c) {
    ?x ?p ?label
    filter(regex(?label,"^\\s|\\s$"))
} group by ?p order by desc(?c)
p c comment
skos:definition "660"^^xsd:integer
rdfs:label "614"^^xsd:integer Most of these are key values (see next section) but some are prop names. Eg ssh:isDescription has multiple trailing spaces or tab
rdfs:comment "150"^^xsd:integer This and all below are key values (see next section)
eq:isFixed "43"^^xsd:integer
sc:isFixed "24"^^xsd:integer
ssh:isFixed "22"^^xsd:integer
dy:isFixed "20"^^xsd:integer
sv:isFixed "10"^^xsd:integer
dcterms:creator "7"^^xsd:integer
dl:isFixed "2"^^xsd:integer
eqbd:isFixed "2"^^xsd:integer
op:isFixed "2"^^xsd:integer

This can be fixed easily with SPARQL Update. Just need to be careful to restore a lang tag if such was present.

griddigit-ci commented 2 months ago

Some of the things are maybe in the RDFS export, some other maybe in the UML

VladimirAlexiev commented 2 months ago

fix01-whitespace-6.ru