Multilinguality in CIM?

This section was provoked by pondering the difference between cim:String and profcim:StringFixedLanguage.

AFAIK, CIM does not allow (and has not considered?) multilinguality

https://github.com/Sveino/Inst4CIM-KG/issues/8 : Header-AP-Voc-RDFS2020.ttl misdefines rdf:LangString but that doesn't count

Eg cim:IdentifiedObject.name doesn't allow multiple values:

ido:IdentifiedObject.name-cardinality
        rdf:type        sh:PropertyShape;
        sh:description  "This constraint validates the cardinality of the property (attribute).";
        sh:group        ido:CardinalityIO;
        sh:message      "Missing required property (attribute).";
        sh:maxCount     1;
        sh:minCount     1;
        sh:name         "IdentifiedObject.name-cardinality";
        sh:order        0.1;
        sh:path         cim:IdentifiedObject.name;
        sh:severity     sh:Violation .

I think it would be better to allow multiple values but impose a sh:uniqueLang constraint (skos:prefLabel has the same restriction). In that way CIM data could accommodate multilinguality. Eg looking at some random properties:

cim:IdentifiedObject.mRID: always string
cim:IdentifiedObject.description: string or langString
cim:IdentifiedObject.name: string or langString
nc:AssessedElementWithContingency.mRID: always string
nc:AssessedElement.normalTargetRemainingAvailableMarginJustification: string or langString

Unfortunately, cim:String is used even for props that should not allow langString, i.e. no distinction is made between these two cases:

Names/descriptions could be string or langString
But identifiers should only be string

So for the time being I think CIM implicitly forbids the use of langString: if you cannot have multiple uniqueLang values, there's not much use for lang tags. Also, allowing lang tags may cause some disturbance in some receiving system.

So I'll map cim:String to xsd:string

rdf:PlainLiteral

The EU eProcurement Ontology allows multilingual data, and used rdfs:Literal. But that datatype is way too broad, so I raised an issue: https://github.com/OP-TED/ted-rdf-mapping/issues/407

The datatype hierarchy is like this: rdfs:Literal > rdf:PlainLiteral > (xsd:string, rdf:langString). What a text field needs to be mapped to depends on its nature:

xsd:string is appropriate for codes that are never translated to multiple langs
rdf:langString is appropriate for texts that are always translated to multiple langs (if not now, then in the future): so a lang tag is required
rdf:PlainLiteral is appropriate for texts that may but don't have to be translated, i.e. lang tag is not required. It is defined at https://w3.org/TR/rdf-plain-literal , and means string or langString.

If you want cim:String to allow langStrings, then we should map it to rdf:PlainLiteral.

Sveino / Inst4CIM-KG

Multilinguality in CIM? #73

rdf:PlainLiteral