RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
250 stars 64 forks source link

Add new option for passing in an ontology specification document #14

Closed ashleysommer closed 5 years ago

ashleysommer commented 5 years ago

It can sometimes (often) be the case that the combination of the SHACL Shape file and the Data File together do not give the pySHACL validation engine enough information to generate a correct validation result, even if inferencing is run across the input data file.

For example:

  1. I have a shape file which asserts that for all instances of the class Human, if they have a property called hasPet, the target object of that property must be an instance of the class Animal.

  2. I have a data file containing statements:

    • Person1 Instance of Human named "Amy", she has a property hasPet with the target Pet1.
    • Pet1 Instance of Lizard named "Sebastian"

If I run the validator across those inputs, it will return a validation result indicating failure because the pet is not of type animal. Even if inferencing is run on the data file, there is no way for the validator to know that Lizard is a subclass of Animal, so the validation still returns the result.

In order for this validation to work, there needs to be a statement of (Lizard, rdfs:subclassOf, Animal) included in the data file before submitting it to the validator, and basic RDFS inferencing must be run on the data graph before validating, to ensure the (Pet1, rdf:type, Animal) triple is created in the data graph.

This is a very simple example but hopefully highlights the problem faced, where any extra ontological information required for inferencing needs to be added into the data file before passing it to the data file. This is inconvenient because in most practical applications of pySHACL, the data file is an isolated data snippet, without any accompanying ontological information.

It is sometimes the case that extra ontologicial information is added into the SHACL Shape file, or indeed that the SHACL Shapes are included as part of an ontology document itself. This does not help in this situation, because the file passed into the validator and parsed into the SHACL Shapes graph does not get mixed into the data graph, so those extra ontological statements do not take effect in the inferencing step on the data graph (and inferencing is never applied to the SHACL graph).

I propose an extra feature for pySHACL where you can optionally specify the location to an extra static ontology document, which gets ingested and mixed into the data graph prior to the inferencing step.

This will be a new feature in the python module, and exposed as an option on the command line tool, and as an optional field on the web tool.

JoelBender commented 5 years ago

Are you interested in just RDFS entailment so the non-RDFS statements in the extra ontology document are ignored, or would you also want entailment from some OWL 2 profile? Maybe run the data file through some external tool(s) that provide these (like cwm or pychinko, or maybe something you can feed in SPIN rules)?

ashleysommer commented 5 years ago

pySHACL can already do OWL-RL entailment, RDFS entailment, Both, or None. I am interested in the use case where there is an external ontology document that uses (a combination of) OWL2 and RDFS to describe a data model, and where a data file given to pySHACL contains snippets which are instances of that data model and rely on those OWL/RDFS axioms in order to be fully described, but the data file itself does not contain the necessary axioms for the pySHACL pre-validation inferencer to correctly build out the graph.

nicholascar commented 5 years ago

Agree this would be great to have as it would basically unify SHACL and custom ontology validation, given that pySHACL already does OWL-RL-level graph expansion as Ashley mentioned above.

Have to be a bit careful you don't implement a recursive loading mechanism where user wants to add in Ontology X that import Y which imports Z etc. so you might have to either provide import options or limit to just import that ontology only.

ashleysommer commented 5 years ago

@nicholascar Absolutely, it will be limited to just one external ontology document, and it will explicitly not follow imports declared in that file.

ashleysommer commented 5 years ago

This feature is implemented in https://github.com/RDFLib/pySHACL/commit/da55eab6d2cea70209ce5d5ca36a369b2d4a036d and released as part of pySHACL v0.9.9