RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

Difference between PySHACL and TopBraid SHACL API #227

Open rmfranken opened 2 months ago

rmfranken commented 2 months ago

I first posted this in the SHACL discord lastweek, but wanted to make an issue here officially.

I'm getting a significantly different result from PySHACL vs TopBraid SHACL validation engine.

parameters for pyshacl: pyshacl -s shapestest.ttl -m -a -f human -j datatest.ttl

command TopBraid: .\shaclvalidate.bat -datafile 'datatest.ttl' -shapesfile 'shapestest.ttl'

I also tried running pyshacl with -i rdfs flag with same result.

Pyshacl seems to find 9 errors TopBraid finds 5 I am expecting 3 validation errors from ex-sh-rl:Or1 , 1 from ex-sh-rl:WrongClass and 1 from ex-sh-rl:bothOrsAtOnce

In other words, for some reason, pyshacl is incorrectly saying that my correctClass and correctSuperClass are wrong for every scenario.

why is there a difference?

Data:

@prefix ex-sh-rl: <http://otl.example.eu/ex/def/shape-rule/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex-sh-rl:CorrectSuperClass a rdfs:Class.
ex-sh-rl:CorrectClass a rdfs:Class ; 
    rdfs:subClassOf ex-sh-rl:CorrectSuperClass.

ex-sh-rl:something sh:class ex-sh-rl:CorrectClass.

ex-sh-rl:WrongClass a rdfs:Class .

Shapes:

@prefix ex-sh-rl: <http://otl.example.eu/ex/def/shape-rule/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex-sh-rl:bothOrsAtOnce a sh:NodeShape ;
    sh:targetClass rdfs:Class ;
    sh:message "Wrong because neither object of sh:class or subject of sh:property" ;
    sh:or
        (
            [

                sh:path ( [ sh:inversePath [ sh:zeroOrMorePath rdfs:subClassOf ] ] sh:property ) ;
                sh:minCount 1 ;
            ]
            [
                sh:path ( [ sh:inversePath [ sh:zeroOrMorePath rdfs:subClassOf ] ] [sh:inversePath sh:class ] ) ;
                sh:minCount 1 ;
            ]
        ) .

ex-sh-rl:Or1 a sh:NodeShape ;
    sh:targetClass rdfs:Class ;
    sh:message "Wrong because not subject sh:property" ;
    sh:property [
        sh:path ( [ sh:inversePath [ sh:zeroOrMorePath rdfs:subClassOf ] ] sh:property ) ;
                sh:minCount 1 ;
    ].

ex-sh-rl:Or2 a sh:NodeShape ;
    sh:targetClass rdfs:Class ;
    sh:message "Wrong because not object of sh:class" ;
    sh:property [
                sh:path ( [ sh:inversePath [ sh:zeroOrMorePath rdfs:subClassOf ] ] [sh:inversePath sh:class ] ) ;
                sh:minCount 1 ;
            ]
        .
ashleysommer commented 2 months ago

I suspect this is a bug in the pyshacl sh:path builder, related to the use of sh:inversePath wrapped around a sh:zeroOrMorePath. I don't remember seeing that kind of pattern before, and there are no tests in the W3C SHACL test suite that cover that pattern, so our implementation may not get that combination quite right.