RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
249 stars 63 forks source link

pyshacl.validate with inference unexpectedly mutates data_graph #69

Closed svenvanderburg closed 3 years ago

svenvanderburg commented 3 years ago

Setting the inference option (I tested with 'rdfs') results in the data_graph being mutated, it is expanded using inference.

Technically this is documented in the Readme.md: _inference is a Python string value to indicate whether or not to perform OWL inferencing expansion of the datagraph before validation. But still I would not expect a validate function to have side-effects. Maybe there are benefits of this behavior that I am unaware of, but it would make more sense for me if under the hood the data_graph object is copied (using copy.deepcopy), and all inference expansions operate on the copy. You could toggle this behavior with an inplace argument that is by default False.

ashleysommer commented 3 years ago

Hi @svenvanderburg PyShacl was designed specifically not to mutate the data graph when validating, especially when using the inference options. There are very careful checks in place to ensure what you've described does not happen. (There are actually old feature requests from users asking for an inplace expansion on their graph, which I have rejected). If you have come across a case where using inferencing does mutate your original data graph, then this is a major bug, and should be treated as such.

I have run some basic tests locally and cannot reproduce the behaviour you described, so I will have to do some deeper digging to get to the bottom of it. Are you able to share a simple snippet of code that reproduces the behaviour you're seeing?

svenvanderburg commented 3 years ago

Hi @ashleysommer OK, I did not expect that this was a bug, sorry for my miserable report in that case :) And glad we agree on that it shouldn't mutate, haha!

Here is a minimal example:

def test_pyschacl():
    g = rdflib.Graph()
    g.add((rdflib.URIRef('http://www.example.org/garfield'),
          rdflib.RDF.type,
          rdflib.URIRef('http://www.example.org/cat')))

    len_before = len(g)
    pyshacl.validate(g, shacl_graph=None, inference='rdfs')
    assert len(g) == len_before

# Which outputs:
AssertionError: graph was mutated
7 != 1

Versions of relevant libraries used:

pyshacl==0.14.0
rdflib==5.0.0
ashleysommer commented 3 years ago

Fixed by https://github.com/RDFLib/pySHACL/commit/4195b0a070e4afd3797e15a8f688611ca1eb693b

ashleysommer commented 3 years ago

Hi @svenvanderburg I've created a new release with a fix for this problem. Released as v0.14.1

I've also added an "inplace" mode, as you suggested, which defaults to False.

svenvanderburg commented 3 years ago

Amazing! 👍 Thanks for the quick action!