RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

Sparql ASK Query to implement unique constraint? #182

Open tduval-unifylogic opened 1 year ago

tduval-unifylogic commented 1 year ago

Greetings again!

I am attempting to implement a unique constraint using Sparql as there is no predicate for this (that I know of ) in SHACL. My thoughts were to use a sparql ask query. I have looked through tests/examples and cannot find an example of where an ask query is used in such a manner as I am looking to implement.

Here is what I'm attempting to use that doesn't get me the desired results. Any suggestions are greatly appreciated!

    @prefix ex: <http://example.com/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

    # ONTO and SHACL
    ex:Person a owl:Class, sh:NodeShape ;
        sh:validator [
            a sh:SPARQLAskValidator ;
            sh:ask  """
                ask
                where {
                    ?person1 ex:email ?email .
                    ?person2 ex:email ?email .
                    FILTER (?person1 != ?person2)
                }
            """ ;
            sh:message "email addresses must be unique." ; ] 
    .

    # DATA
    ex.i:Person1 a ex:Person ;
        ex:email "email@address.com" 
    .
    ex.i:Person2 a ex:Person ;
        ex:email "email@address.com" 
    .
ajnelson-nist commented 1 year ago

This looks like sh:maxCount, value 1, would meet your needs.

tduval-unifylogic commented 1 year ago

yes, i would use sh:maxCount if i needed to check if a single instance has more than one ex:email.

What I'm looking to do is check all instances of ex:Person to see if any of them have the same value for ex:email.

Unless there is something I am missing/not seeing?

ajnelson-nist commented 1 year ago

Right, I read that backwards, I see now.

I think you could do this by treating the email value as a node---which it formally is, but linguistically I still have a hard time calling literals nodes.

ex:my-email-objects-shape
    a sh:NodeShape ;

    # Target the *object* of the predicate.  So, the Object member of the triple is the node whose shape we're constraining.
    sh:targetObjectsOf ex:email ;
    # Peek backwards to the subject using an inverse path.
    sh:property [
        a sh:PropertyShape ;
        sh:maxCount 1 ;
        sh:path [
            sh:inversePath ex:email .
        ] ;
    ] ;

    # That should do it.
.
tduval-unifylogic commented 1 year ago

Thanks!

Just tried this and it throws validation error when the email addresses are the same, but also throws a validation error when they are different.

    @prefix ex: <http://example.com/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix ex.i: <http://example.com/instance/> .

    # ONTO & SHACL
    ex:PersonShape a sh:NodeShape ;
        sh:targetObjectsOf ex:email ;
        sh:property [
            sh:maxCount 1 ;
            sh:path [
                sh:inversePath ex:email ;
            ] ;
        ] ;
    .
    # DATA
    ex.i:Person1 a ex:Person ;
        ex:email "email@address.com" 
    .
    ex.i:Person2 a ex:Person ;
        ex:email "email1@address.com" 
    .     
tduval-unifylogic commented 1 year ago

AGH! I just realized what I did wrong. this works!!

tduval-unifylogic commented 1 year ago

thanks so much for your help!

now working on if I can create a composite unique. This seems to work. Does it look correct semantically?

    @prefix ex: <http://example.com/> .
    @prefix owl: <http://www.w3.org/2002/07/owl#> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix ex.i: <http://example.com/instance/> .

    # ONTO & SHACL
    ex:Person a owl:Class, sh:NodeShape ;
        sh:property [
            sh:name "unique constraint" ;
            sh:description "" ;
            sh:targetObjectsOf ex:email, ex:address ;
            sh:maxCount 1 ;
            sh:path [
                sh:inversePath ex:email, ex:address ;
            ] ;
        ] ;
    .
    # DATA
    ex.i:Person1 a ex:Person ;
        ex:email "email@address.com" ;
        ex:address "address" .
    ex.i:Person2 a ex:Person ;
        ex:email "email@address.com" ;
        ex:address "address" .  
ajnelson-nist commented 1 year ago

It looks like I missed you'd probably meant to reply to me.

I do not know what the semantics are of putting multiple targetObjectsOf into one sh:Shape. It turns out to be unnecessary to have sh:targetObjectsOf in that nested shape, though - you're already using the (implicit) selector from ex:Person, so you don't need sh:target(anything) on the sh:PropertyShape tied with sh:property.

Also, my recollection was one subject of sh:inversePath can't have two objects like you've spelled. The --metashacl flag (that I suggested you use in #183 ) confirms this usage is wrong. Here's the shell transcript of when I put your example into ex.ttl:

$ pyshacl --metashacl --shacl ex.ttl ex.ttl
SHACL File does not validate against the SHACL Shapes SHACL (MetaSHACL) file.
Validation Report
Conforms: False
Results (2):
Constraint Violation in XoneConstraintComponent (http://www.w3.org/ns/shacl#XoneConstraintComponent):
    Severity: sh:Violation
    Source Shape: shsh:ShapeShape
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Message: Node [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ] does not conform to exactly one shape in shsh:NodeShapeShape , shsh:PropertyShapeShape
Constraint Violation in OrConstraintComponent (http://www.w3.org/ns/shacl#OrConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:or ( shsh:PathShape [ sh:nodeKind sh:IRI ] ) ; sh:path sh:path ]
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:inversePath ex:address, ex:email ]
    Result Path: sh:path
    Message: Node [ sh:inversePath ex:address, ex:email ] does not conform to one or more shapes in shsh:PathShape , [ sh:nodeKind sh:IRI ]

Validator encountered a Runtime Error:
SHACL File does not validate against the SHACL Shapes SHACL (MetaSHACL) file.
Validation Report
Conforms: False
Results (2):
Constraint Violation in XoneConstraintComponent (http://www.w3.org/ns/shacl#XoneConstraintComponent):
    Severity: sh:Violation
    Source Shape: shsh:ShapeShape
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Message: Node [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ] does not conform to exactly one shape in shsh:NodeShapeShape , shsh:PropertyShapeShape
Constraint Violation in OrConstraintComponent (http://www.w3.org/ns/shacl#OrConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:or ( shsh:PathShape [ sh:nodeKind sh:IRI ] ) ; sh:path sh:path ]
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:inversePath ex:address, ex:email ]
    Result Path: sh:path
    Message: Node [ sh:inversePath ex:address, ex:email ] does not conform to one or more shapes in shsh:PathShape , [ sh:nodeKind sh:IRI ]

If you believe this is a bug in pyshacl, open an Issue on the pyshacl github page.

Confirming this has nothing to do with the instance data, here is the same command run against a graph with one owl:Thing individual, and the "# DATA" section cut from ex.ttl:

$ cat thing.ttl
@prefix owl: <http://www.w3.org/2002/07/owl#> .

[] a owl:Thing .
$ pyshacl --metashacl --shacl ex.ttl thing.ttl
SHACL File does not validate against the SHACL Shapes SHACL (MetaSHACL) file.
Validation Report
Conforms: False
Results (2):
Constraint Violation in OrConstraintComponent (http://www.w3.org/ns/shacl#OrConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:or ( shsh:PathShape [ sh:nodeKind sh:IRI ] ) ; sh:path sh:path ]
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:inversePath ex:address, ex:email ]
    Result Path: sh:path
    Message: Node [ sh:inversePath ex:address, ex:email ] does not conform to one or more shapes in shsh:PathShape , [ sh:nodeKind sh:IRI ]
Constraint Violation in XoneConstraintComponent (http://www.w3.org/ns/shacl#XoneConstraintComponent):
    Severity: sh:Violation
    Source Shape: shsh:ShapeShape
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Message: Node [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ] does not conform to exactly one shape in shsh:NodeShapeShape , shsh:PropertyShapeShape

Validator encountered a Runtime Error:
SHACL File does not validate against the SHACL Shapes SHACL (MetaSHACL) file.
Validation Report
Conforms: False
Results (2):
Constraint Violation in OrConstraintComponent (http://www.w3.org/ns/shacl#OrConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:or ( shsh:PathShape [ sh:nodeKind sh:IRI ] ) ; sh:path sh:path ]
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:inversePath ex:address, ex:email ]
    Result Path: sh:path
    Message: Node [ sh:inversePath ex:address, ex:email ] does not conform to one or more shapes in shsh:PathShape , [ sh:nodeKind sh:IRI ]
Constraint Violation in XoneConstraintComponent (http://www.w3.org/ns/shacl#XoneConstraintComponent):
    Severity: sh:Violation
    Source Shape: shsh:ShapeShape
    Focus Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Value Node: [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ]
    Message: Node [ sh:description Literal("") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path [ sh:inversePath ex:address, ex:email ] ; sh:targetObjectsOf ex:address, ex:email ] does not conform to exactly one shape in shsh:NodeShapeShape , shsh:PropertyShapeShape

If you believe this is a bug in pyshacl, open an Issue on the pyshacl github page.

If you take what is piled into one sh:PropertyShape (object of sh:property) and split it into two sh:PropertyShapes, one for ex:email and one for ex:address, you'll get past the SHACL-SHACL error.

Back to uniqueness-constraining: What you need to do is select the object of the predicate, and then "hop backwards". sh:targetObjectsOf was the selector in my example ex:my-email-objects-shape, because I wrote a sh:NodeShape focused on that property. If you want a sh:NodeShape focused on the class (which I think is a reasonable exercise---it's a shape that roughly says "emails are uniquely used among this class, and likewise for addresses"), you need to use a property path that goes to the object of the property, and then back along all inverses of that Literal.

Here is your example graph showing a corrected implementation and a still-incorrect implementation, also with one more individual that is expected to not trigger an error:

$ cat ex.ttl 
@prefix ex: <http://example.com/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ex.i: <http://example.com/instance/> .

# ONTO & SHACL
ex:Person
    a
        owl:Class ,
        sh:NodeShape
        ;
    sh:property
        [
            sh:name "unique constraint" ;
            rdfs:comment "This shape is NOT correct yet.  No complaints raised from DATA section's ex:email usage."@en ;
            sh:maxCount 1 ;
            sh:path [
                sh:inversePath ex:email ;
            ] ;
        ] ,
        [
            sh:name "unique constraint" ;
            sh:maxCount 1 ;
            sh:path (
                ex:address
                [
                    sh:inversePath ex:address ;
                ]
            ) ;
        ]
        ;
    .

# DATA
ex.i:Person1 a ex:Person ;
ex:email "email@address.com" ;
ex:address "address" .
ex.i:Person2 a ex:Person ;
ex:email "email@address.com" ;
ex:address "address" .
ex.i:Person3 a ex:Person ;
ex:email "email2@address2.com" ;
ex:address "a different address" .

Here is the shell transcript of running that - and because your DATA section is effectively an XFAIL test between Person1 and Person2, you should see that it's not failing everywhere it should be:

$ pyshacl --metashacl --shacl ex.ttl ex.ttl
Validation Report
Conforms: False
Results (2):
Constraint Violation in MaxCountConstraintComponent (http://www.w3.org/ns/shacl#MaxCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path ( ex:address [ sh:inversePath ex:address ] ) ]
    Focus Node: ex.i:Person2
    Result Path: ( ex:address [ sh:inversePath ex:address ] )
    Message: More than 1 values on ex.i:Person2->( ex:address [ sh:inversePath ex:address ] )
Constraint Violation in MaxCountConstraintComponent (http://www.w3.org/ns/shacl#MaxCountConstraintComponent):
    Severity: sh:Violation
    Source Shape: [ sh:maxCount Literal("1", datatype=xsd:integer) ; sh:name Literal("unique constraint") ; sh:path ( ex:address [ sh:inversePath ex:address ] ) ]
    Focus Node: ex.i:Person1
    Result Path: ( ex:address [ sh:inversePath ex:address ] )
    Message: More than 1 values on ex.i:Person1->( ex:address [ sh:inversePath ex:address ] )

Do you see why?

ajnelson-nist commented 1 year ago

Oops, I realized I an error in my demonstration. I think the uniqueness constraint needs to include a qualified shape on the class of the thing being hopped "back" towards. Depending on whether this is intended or not, the pervasiveness of the backwards hop can be demonstrated by adding this extra individual to the graph - note that it is typeless:

ex.i:Person4
ex:email "email2@address2.com" ;
ex:address "a different address" .  

ex.i:Person4 is not a ex:Person. (This example is a little odd for "Persons," but might make sense for other things, like imported ex:ImportedRecords vs. locally-generated ex:LocalRecords.) Should ex.i:Person3 care that a node not classified as ex:Person is using its supposedly-unique email address?

I'm not sure offhand how to write such a qualified shape. It sounds like a good SHACL exercise.

ashleysommer commented 11 months ago

@ajnelson-nist Thank you for your help fielding this issue.

@tduval-unifylogic Is this issue resolved? Can it be closed now?