RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

Trouble with qualified value shape #213

Closed ajnelson-nist closed 8 months ago

ajnelson-nist commented 8 months ago

Hello,

I'm having a difficult time making a qualified value shape work.

I've tried the graph ex:Hand example in SHACL Section 4.7.3, and have gotten it to trigger as the spec has me think it should, with this data:

@prefix ex: <http://example.com/ns#> .

ex:hand-1
    a ex:Hand ;
    ex:digit
        [ a ex:Finger ] ,
        [ a ex:Finger ] ,
        [ a ex:Finger ] ,
        [ a ex:Finger ] ,
        [ a ex:Finger ]
        ;
    .

The results of running pyshacl --shacl HAND_SH.ttl HAND_DATA.ttl are:

Validation Report
Conforms: False
Results (2):
Constraint Violation in QualifiedValueShapeConstraintComponent (http://www.w3.org/ns/shacl#QualifiedMinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:path ex:digit ; sh:qualifiedMaxCount Literal("1", datatype=xsd:integer) ; sh:qualifiedMinCount Literal("1", datatype=xsd:integer) ; sh:qualifiedValueShape [ sh:class ex:Thumb ] ]
    Focus Node: ex:hand-1
    Result Path: ex:digit
    Message: Focus node does not conform to shape MinCount 1 MaxCount 1: [ sh:class ex:Thumb ]
Constraint Violation in QualifiedValueShapeConstraintComponent (http://www.w3.org/ns/shacl#QualifiedMaxCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:path ex:digit ; sh:qualifiedMaxCount Literal("4", datatype=xsd:integer) ; sh:qualifiedMinCount Literal("4", datatype=xsd:integer) ; sh:qualifiedValueShape [ rdf:type sh:NodeShape ; sh:class ex:Finger ] ]
    Focus Node: ex:hand-1
    Result Path: ex:digit
    Message: Focus node does not conform to shape MinCount 4 MaxCount 4: [ rdf:type sh:NodeShape ; sh:class ex:Finger ]

(This worked with and without sh: qualifiedValueShapesDisjoint included)

But, I'm having a hard time getting a reduced example to trigger.

The shapes graph I'm trying is:

@prefix ex: <http://example.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

ex:MyShape
    a sh:NodeShape ;
    sh:targetClass ex:MyFirstClass ;
    sh:property [
        rdfs:comment "This triggers as expected."@en ;
        sh:path ex:myProperty ;
        sh:minCount 1 ;
    ] ;
    sh:property [
        rdfs:comment "This does not trigger...why?"@en ;
        sh:path ex:myProperty ;
        sh:qualifiedValueShape [
            sh:class ex:MySecondClass
        ] ;
        sh:qualifiedMinCount 1 ;
    ]
    .

The data graph I'm feeding in is:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .

kb:MyFirstClass-instance-not-ok
    a ex:MyFirstClass ;
    .

kb:MyFirstClass-instance-ok
    a ex:MyFirstClass ;
    ex:myProperty [
        a ex:MySecondClass ;
    ] ;
    .

And this is the only result I'm getting from running pyshacl --metashacl --shacl POST_SH.ttl POST_DATA.ttl:

Validation Report
Conforms: False
Results (1):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ rdfs:comment Literal("This triggers as expected.", lang=en) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:path ex:myProperty ]
    Focus Node: kb:MyFirstClass-instance-not-ok
    Result Path: ex:myProperty
    Message: Less than 1 values on kb:MyFirstClass-instance-not-ok->ex:myProperty

(--metashacl didn't raise any complaints.)

I was expecting a second sh:Violation to occur based on the qualified value shape. The first violation shows the unqualified minimum count is working. Are there any thoughts on why the qualified minimum count isn't working?

ashleysommer commented 8 months ago

@ajnelson-nist Thanks for reporting this. This is one of the areas where the SHACL W3C Test Suite (SHT) and the Datashape Tests suite (DASH) both are lacking comprehensive and exhaustive test cases. This is likely a bug or oversight in implementation in PySHACL, so first I'll create a valid reproduction and see if I can track down the source of the issue.

ashleysommer commented 8 months ago

Ok, I've found the source of the problem. It is due to a potentially overzealous optimisation added when the feature to detect and prevent shape-recursion was implemented in Sept 2022. https://github.com/RDFLib/pySHACL/blob/52e239c41b88497e9c9f5fd8ae0ae567e59885b8/pyshacl/constraints/core/shape_based_constraints.py#L323-L328

Basically, the second PropertyShape on MyShape does not see any focus/value nodes at the path sh:path ex:myProperty on kb:MyFirstClass-instance-not-ok so it skips checking the Constraints on that path. Thats why the violation of that constraint does not trigger.

So I suppose an easy solution would be to skip this shortcut check if qualifiedMinCount > 0 , so that it triggers on the absence of values nodes. It does raise another question though: Is the complete absence of value nodes on that property path considered logically the same as less-than-one 'qualified' nodes on that path that are conformant to the qualifiedValueShape? It does hold true for sh:minCount as you pointed out, so indeed it probably should also work for qualifiedMinCount on qualifiedValueShape.

ajnelson-nist commented 8 months ago

So I suppose an easy solution would be to skip this shortcut check if qualifiedMinCount > 0 , so that it triggers on the absence of values nodes.

I think this would be correct.

It does raise another question though: Is the complete absence of value nodes on that property path considered logically the same as less-than-one 'qualified' nodes on that path that are conformant to the qualifiedValueShape? It does hold true for sh:minCount as you pointed out, so indeed it probably should also work for qualifiedMinCount on qualifiedValueShape.

Yes, I believe the complete absence of value nodes equally effects unqualified property shapes and qualified property shapes. Looking over the qualified shapes part of the spec, the phrasing does not seem to have a "qualified matches among matches found" logical flex. Instead, sentence one looks pretty definitive: "sh:qualifiedValueShape specifies the condition that a specified number of value nodes conforms to the given shape."

ashleysommer commented 8 months ago

Yes, I agree with your assessment, and that's the same conclusion I came to. I've pushed a fix for this https://github.com/RDFLib/pySHACL/commit/d939e3030816d98fa7a8597b816ba2132b2fdad1 It will be doing the big churn to support RDFLib 7.0.0 this weekend, then the new version will be pushed out, including this fix. While debugging this, I also found an edge-case error in my implementation of handling sh:qualifiedValueShapesDisjoint so that's a bonus.

ajnelson-nist commented 8 months ago

Spectacular to hear! Many thanks!

ashleysommer commented 8 months ago

Okay, @ajnelson-nist you don't need to wait until the weekend. The new version is out already.