RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
245 stars 63 forks source link

Validation of Enumeration Values?? #178

Closed tduval-unifylogic closed 1 year ago

tduval-unifylogic commented 1 year ago

sorry to bug you folks again. I thought I had this working, but now I'm banging my head against the wall since is not.

I'm trying to create validation for an enumeration individual. I know I'm doing something incorrect, so I thought I would reach out. I've tried sh:SPARQLConstraint, sh:ShapeNode and no luck.

I want to create reusable shapes to validate and rather not use sh:in ( ex:Value1 ex:Value2 ex:Value3 ) , since then I'd have to maintain two lists (so sparql seems the best method).

Any help is greatly appreciated!

Here is the onto graph:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix ex: <http://example.com/> .

ex:MyEnumClassValues a sh:SPARQLConstraint ;
  sh:prefixes ex: ;
  sh:select """
    SELECT ?member WHERE {
      ex:MyEnumClass owl:oneOf/rdf:rest*/rdf:first ?member .
    }
  """ .

ex:myClass a owl:Class ;
  sh:property [ 
              sh:node ex:MyEnumClassValues ;
              sh:path ex:enum ] .

ex:MyEnumClass a owl:Class ;
  owl:oneOf ( ex:Value1 ex:Value2 ex:Value3 ) .

ex:Value1 a ex:MyEnumClass, owl:NamedIndividual .
ex:Value2 a ex:MyEnumClass, owl:NamedIndividual .
ex:Value3 a ex:MyEnumClass, owl:NamedIndividual .

Here is the data graph:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ex: <http://example.com/> .
@prefix ex.i: <http://example.com/instances/> .

    ex.i:myClass_1 rdf:type ex:myClass ;
        ex:enum ex:Value12 
   .

same python code as other issue (with no inferencing!):

r = validate(dg,
      shacl_graph=og,
      ont_graph=og,
      inference='none',            
      abort_on_first=False,
      allow_infos=False,
      allow_warnings=False,
      meta_shacl=False,
      advanced=False,
      # js=True,
      debug=False)
conforms, results_graph, results_text = r
print(results_text)
ajnelson-nist commented 1 year ago

Long answer for you, because working with enumerants in RDF has been pretty non-trivial in my experience.

First, try using SHACL-SHACL validation against your SHACL graph. It's the --metashacl flag on pyshacl, meta_shacl: bool keyword argument in validate(). You have at least one syntax error, and one missing triple, that are directly impacting your example:

ex:myClass a owl:Class ;
  sh:property [ 
              sh:node ex:MyEnumClassValues ;
              sh:path ex:enum ] .

ex:MyEnumClassValues is a sh:SPARQLConstraint, which should be linked by sh:sparql, not sh:node. I think the --metashacl flag will raise this.

Also, ex:myClass is an owl:Class, but not a sh:NodeShape, so it will have no targeting done. A cheat SHACL allows is a class X that is both an owl:Class and sh:NodeShape will have a targeter X sh:targetClass X assumed.

So, that example chunk I quoted should read instead:

ex:myClass a owl:Class, sh:NodeShape ;
  sh:property [ 
              sh:sparql ex:MyEnumClassValues ;
              sh:path ex:enum ] .

For an example of this from the spec, see Section 5.1, especially the example with ex:LanguageExamplePropertyShape which has a SPARQLConstraint applied on a PropertyShape.

Also, I'm not quite sure if it matters, but your SELECT query's first variable (maybe any variable, if order doesn't matter?) should be $this, with a dollar sign. See Section 5.3.1 and 5.3.2; also note the other pre-bound variables have a leading question mark like in regular SPARQL.

The summary of the above is that the SHACL-SHACL shapes, provided by the SHACL specification to validate your SHACL, should be tried first to see if there are any syntax gotchas.

The summary of the below is, you might be better off using sh:in anyway. It depends on if you are in an application that requires OWL 2 DL. This summary is c/o work I've done with an ontology community that tangled with OWL enumeration syntax for a while.

From a quirk of OWL 2 DL syntactic requirements, if you are developing an OWL 2 DL ontology, you must duplicate the rdf:Lists between the SHACL shape and OWL datatype. The reason is OWL 2 DL disallows the various xOneOf predicate-objects from being IRIs; they're required in the OWL2 Mapping to RDF document to be blank nodes. (I keep wanting to be wrong on this, because keeping the lists in sync. is a moderate quality-control issue. I welcome any pointer correcting me!) That community worked through the syntax trouble and QC matters in this Issue.

The shape we defined to validate our OWL usage is here. From reading on another thread (I forget which) in the pySHACL repository, I've come to realize that while correct, it is potentially a grievously slow implementation, and should be spelled with SHACL syntax, eschewing SHACL-SPARQL - I encourage you to try that using sh:target*.

If you're not constraining your development to OWL 2 DL, you can use an IRI-identified rdf:List and have that in the sh:in, no problem. Otherwise, take a look at the two PRs associated with that UCO issue for how we ended up settling the matter. (There is some additional complexity in the UCO PRs that handle UCO "softly" requiring membership in lists, permitting non-members to be used but raising sh:Info-level validation results if they are. That's why the PropertyShapes pertaining to list-member validation come in sets of 3.)

Within your example, it looks like ex:MyEnumClass is incorrect in OWL-spelling. An OWL editing tool will probably complain to you about a malformed class. You need to use an anonymous class that is linked with owl:equivalentClass. (The OWL-RDF parsing algorithm is strict, per the tiny, very last line of Section 3 of that 20121211 document: "At the end of this process [i.e. all these tables of matching rules], the graph G must be empty", which means all triples must have matched a pattern exactly.) But, that shouldn't impact your SHACL.

Last, there is a potential ontological re-design solution: Consider if there is a reason that you are using an enumeration of individuals, rather than an owl:Class. Take two enumerations of individuals that are similar in nature to one another. Do I define pigmentary primary colors as an enumeration of red, blue, and yellow, or as a class which has red, blue, and yellow as members? What about luminary primary colors being red, blue, and green? How would I search for a primary color whether it is luminary or pigmentary - crawling through two enumerations, or looking for membership in a more primitive owl:Class? If the members of the enumeration-or-class number around 100 instead of 3, consider if traversing an rdf:List to validate every usage in your graph is truly preferable to looking for a triple x rdf:type :Foo.

Hopefully some or all of that helps.

tduval-unifylogic commented 1 year ago

Thank you for your response. Good eye on spotting my owl:equivalentClass omission. In fact, our production ontologies do have the owl:equivalentClass linking. 👍

I found a much simpler way to handle this scenario, btw. I will always have explicit individuals since we manage metadata on them, so this option is available and works nicely (no meta_shacl required from what I see).

@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix sh: http://www.w3.org/ns/shacl# . @prefix ex: http://example.com/ .

ex:myClass a owl:Class ; sh:property [ sh:class ex:MyEnumClass, owl:NamedIndividual ; sh:path ex:enum ] .

ex:MyEnumClass a owl:Class ; owl:oneOf ( ex:Value1 ex:Value2 ex:Value3 ) .

ex:Value1 a ex:MyEnumClass, owl:NamedIndividual . ex:Value2 a ex:MyEnumClass, owl:NamedIndividual . ex:Value3 a ex:MyEnumClass, owl:NamedIndividual .