Closed ashleysommer closed 5 years ago
Are you interested in just RDFS entailment so the non-RDFS statements in the extra ontology document are ignored, or would you also want entailment from some OWL 2 profile? Maybe run the data file through some external tool(s) that provide these (like cwm or pychinko, or maybe something you can feed in SPIN rules)?
pySHACL can already do OWL-RL entailment, RDFS entailment, Both, or None. I am interested in the use case where there is an external ontology document that uses (a combination of) OWL2 and RDFS to describe a data model, and where a data file given to pySHACL contains snippets which are instances of that data model and rely on those OWL/RDFS axioms in order to be fully described, but the data file itself does not contain the necessary axioms for the pySHACL pre-validation inferencer to correctly build out the graph.
Agree this would be great to have as it would basically unify SHACL and custom ontology validation, given that pySHACL already does OWL-RL-level graph expansion as Ashley mentioned above.
Have to be a bit careful you don't implement a recursive loading mechanism where user wants to add in Ontology X that import Y which imports Z etc. so you might have to either provide import options or limit to just import that ontology only.
@nicholascar Absolutely, it will be limited to just one external ontology document, and it will explicitly not follow imports declared in that file.
This feature is implemented in https://github.com/RDFLib/pySHACL/commit/da55eab6d2cea70209ce5d5ca36a369b2d4a036d and released as part of pySHACL v0.9.9
It can sometimes (often) be the case that the combination of the SHACL Shape file and the Data File together do not give the pySHACL validation engine enough information to generate a correct validation result, even if inferencing is run across the input data file.
For example:
I have a shape file which asserts that for all instances of the class
Human
, if they have a property calledhasPet
, the target object of that property must be an instance of the classAnimal
.I have a data file containing statements:
Person1
Instance ofHuman
named"Amy"
, she has a propertyhasPet
with the targetPet1
.Pet1
Instance ofLizard
named"Sebastian"
If I run the validator across those inputs, it will return a validation result indicating failure because the pet is not of type animal. Even if inferencing is run on the data file, there is no way for the validator to know that
Lizard
is a subclass ofAnimal
, so the validation still returns the result.In order for this validation to work, there needs to be a statement of
(Lizard, rdfs:subclassOf, Animal)
included in the data file before submitting it to the validator, and basic RDFS inferencing must be run on the data graph before validating, to ensure the(Pet1, rdf:type, Animal)
triple is created in the data graph.This is a very simple example but hopefully highlights the problem faced, where any extra ontological information required for inferencing needs to be added into the data file before passing it to the data file. This is inconvenient because in most practical applications of pySHACL, the data file is an isolated data snippet, without any accompanying ontological information.
It is sometimes the case that extra ontologicial information is added into the SHACL Shape file, or indeed that the SHACL Shapes are included as part of an ontology document itself. This does not help in this situation, because the file passed into the validator and parsed into the SHACL Shapes graph does not get mixed into the data graph, so those extra ontological statements do not take effect in the inferencing step on the data graph (and inferencing is never applied to the SHACL graph).
I propose an extra feature for pySHACL where you can optionally specify the location to an extra static ontology document, which gets ingested and mixed into the data graph prior to the inferencing step.
This will be a new feature in the python module, and exposed as an option on the command line tool, and as an optional field on the web tool.