Sveino / Inst4CIM-KG

Instance of CIM Knowledge Graph
Apache License 2.0
3 stars 1 forks source link

Don't use SHACL SPARQL where SHACL Standard is enough #17

Open VladimirAlexiev opened 2 months ago

VladimirAlexiev commented 2 months ago

"At least one of these props must be populated" is implemented with shapes like this:

erc:EnergyComponent  a          sh:NodeShape ;
        sh:property      erc:EnergyComponent-associations;
        sh:targetClass  nc:EnergyComponent .

erc:EnergyComponent-associations
        a               sh:PropertyShape ;
        sh:description  "The EnergyComponent shall be associated with either GeneratingUnit, PowerElecronicsUnit, EnergyConsumer or HydroPump." ;
        sh:sparql       erc:EnergyComponent-associationsSparql ;
        sh:path         rdf:type ;
        sh:group        erc:ERgroup ;
        sh:name         "C:NC:ER:EnergyComponent:associations" ;
        sh:order        2 ;
        sh:severity     sh:Violation .

erc:EnergyComponent-associationsSparql
        a         sh:SPARQLConstraint ;  
        sh:message      "EnergyComponent is not associated with either GeneratingUnit, PowerElecronicsUnit, EnergyConsumer or HydroPump." ;
        sh:prefixes cim: ;
        sh:select """
            SELECT  $this  
            WHERE {      
        BIND(EXISTS{$this nc:EnergyComponent.GeneratingUnit ?o1} AS ?adugu).
        BIND(EXISTS{$this nc:EnergyComponent.PowerElecronicsUnit ?o2} AS ?adupe).
        BIND(EXISTS{$this nc:EnergyComponent.EnergyConsumer ?o2} AS ?aduec).
        BIND(EXISTS{$this nc:EnergyComponent.HydroPump ?o2} AS ?aduhp).
        FILTER (?adugu=false && ?adupe=false && ?aduec=false && ?aduhp=false).
            }""" .          

But SHACL SPARQL is more costly than SHACL Standard (cannot be implemented as efficiently). So it's better to express this with standard constructs:

erc:EnergyComponent-associations a sh:PropertyShape ; sh:description "The EnergyComponent shall be associated with either GeneratingUnit, PowerElecronicsUnit, EnergyConsumer or HydroPump." ; sh:sparql erc:EnergyComponent-associationsSparql ; sh:path [sh:alternativePath (nc:EnergyComponent.GeneratingUnit nc:EnergyComponent.PowerElecronicsUnit nc:EnergyComponent.EnergyConsumer nc:EnergyComponent.HydroPump)]; sh:minCount 1 ; sh:group erc:ERgroup ; sh:name "C:NC:ER:EnergyComponent:associations" ; sh:order 2 ; sh:severity sh:Violation .

- or `sh:or`: IMHO this is less natural, but allows to also check more stuff in each PropertyConstraint:
(UPDATE: edited NodeShape URL and metadata as per https://github.com/Sveino/Inst4CIM-KG/issues/16)
```ttl
erc:EnergyComponent-associations  a          sh:NodeShape ;
  sh:or      (erc:EnergyComponent-prop-GeneratingUnit erc:EnergyComponent-prop-PowerElecronicsUnit erc:EnergyComponent-prop-EnergyConsumer erc:EnergyComponent-HydroPump);
  sh:message "EnergyComponent is not associated with either GeneratingUnit, PowerElecronicsUnit, EnergyConsumer or HydroPump.";
  skos:notation  "C:NC:ER:EnergyComponent:associations";
  sh:targetClass  nc:EnergyComponent .

erc:EnergyComponent-prop-GeneratingUnit a sh:PropertyShape;
  sh:path nc:EnergyComponent.GeneratingUnit;
  sh:minCount 1;
  # add more if needed, eg sh:class

# etc for the other 3 props
Sveino commented 2 weeks ago

I assume this has been a validator issue, e.g. Jena.

griddigit-ci commented 2 weeks ago

At the time we tested sh:and and sh:or were behaving way too bad 100x times more time needed. It looked like the SPARQL was much more performant less than 1 ms or few ms But here we need testing as it depends on the structure of the instance data Definitely there is a room for optimization of the complex constraints. Some optimisation was done, but much more is needed. I experiences huge gains as in some cases something running for 600 s could be dropped to 2 ms

On the concrete one, are we sure that all alternative paths will we required to have minCount 1. if this is the case then we test it and do it that way. Jena is supporting alternativePath so no issue there. I am using it in different places now with the support of multiple CIM namespaces

VladimirAlexiev commented 2 weeks ago

sh:and and sh:or were behaving way too bad 100x times more time

Which engine? Do you have the details written somewhere, it will help a lot with my engagement.

are we sure that all alternative paths will we required to have minCount 1

At least one of the props must have a value.