RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

Returning modified data graph instead of validation report? #189

Closed devonsparks closed 11 months ago

devonsparks commented 1 year ago

Is there a way to get back the inferred triples (from pre-inferencing and SHACL rules) instead of the validation report?

I thought I might be able to read the target_graph of Validator, like this (here only demoing RDFS pre-inferencing):

from pyshacl import Validator
from rdflib import Graph

g = Graph()

smts = """
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix C: <http://example.org/> .
@prefix ex: <http://example.org/> .

C:A rdfs:subClassOf C:B . 
ex:something a C:A .
"""

g.parse(data=smts)
v = Validator(g,
      inference='rdfs',
      advanced=True)

v.run()
print(v.target_graph.serialize(format='ttl'))

But the result is unchanged:

@prefix C: <http://classes.org/> .
@prefix ex: <http://example.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:something a C:A .

C:A rdfs:subClassOf C:B .

# Expecting to see 
#       <ex:something> a <C:B> 
# inferred 

Similar results for SHACL Rules.

Thanks in advance!

ajnelson-nist commented 1 year ago

I thought, in part from the discussions around inoculation, that the data graph would have been modified. But I tried your code and saw v.data_graph serialized the same before and after v.run().

I'm interested in an answer to this thread too, because I'm interested in seeing how to extract the triples generated from SHACL (Advanced Features) Rules.

devonsparks commented 1 year ago

Reading through the Validator code, I also tried passing inplace=True, and inspecting the data_graph after the run. No change. I can confirm the trouble isn't with either the RDFS or SHACL Rules I've set, because they work fine in Protege and TopBraid's SHACL engine respectively. Also tried separating rules and ontology graphs from the data graph to no effect. Will keep digging through the code to try to grok where the output is being kept/lost, but appreciate insight from anyone more familiar with the code base.

ashleysommer commented 1 year ago

Hi @devonsparks

This question is another duplicate of https://github.com/RDFLib/pySHACL/issues/20, that has been asked many times and answered many times (#20, #78, #148), with additional discussion in #60. In short, PySHACL is just a SHACL Validation engine, its purpose is to validate a datagraph against given SHACL shapes and constraints in accordance with the W3C SHACL Specification, and return a validation result and a validation report. It does OWL/RDF Inferencing/entailment, and Expansion using SHACL Rules internally for the purposes of validating the graph, but it does not make the expanded graph available to the user. It is even part of the SHACL Spec that the validator should not modify the data graph as part of its validation. So to accommodate that, PySHACL takes an internal copy of the input datagraph, and performs any modification required on that only. That is why the original datagraph is unmodified.

It appears that you have already dug that far, because you have tried the unofficial hacks to work around this limitation, including checking validator.target_graph (as discussed in #20) and passing inplace=True (as discussed in #78). These workarounds do work, however you have an error in your code, you are constructing Validator with the wrong options. (The options to construct a Validator object instance are a dict, unlike the parameters to pyshacl.validate() helper.)

This should work using the internal target-graph method:

from pyshacl import Validator

v = Validator(g, shacl_graph=myshapes, options={"advanced": True, "inference": "rdfs"})
conforms, report_graph, report_text = v.run()
expanded_graph = v.target_graph #<-- This gets the expanded data graph

This should work using the unofficial inplace modifier method:

from pyshacl import Validator

v = Validator(g, shacl_graph=myshapes, options={"advanced": True, "inplace": True, "inference": "rdfs"})
conforms, report_graph, report_text = v.run()
g #<-- g is expanded inplace

or using the validate() helper function:

from pyshacl import validate

conforms, report_graph, report_text = validate(g, shacl_graph=myshapes, advanced=True, inference="rdfs", inplace=True)
g #<-- g is expanded inplace

If you believe PySHACL should be more than a validation engine, and have an alternate mode in which PySHACL acts as a general purpose entailment/rules expander, please discuss that in #60.

Note, I see you are not passing a SHACL Shapes file in your examples. When you do that, PySHACL searches the datagraph for Shapes. It it doesn't find any, it doesn't run validate anything. I'm not sure if that might also be a factor in the unexpected results you are seeing.

ashleysommer commented 11 months ago

Hi @devonsparks Can you confirm the above solves your issue? Can this thread be closed now?

devonsparks commented 11 months ago

Hi @ashleysommer - Yes, this does seem to resolve it. Apologies for not finding the duplicates sooner. I'd taken a look, but must have neglected to filter on closed issues. I will continue to discuss on #60. Okay to close.

Given the number of folks that seem to ask about this, maybe worth putting as an FAQ in the README? I'll raise on #60 for further discussion. Thanks!