Closed dandunbar23 closed 12 months ago
Hi @dandunbar23 Thank you for the very detailed and helpful issue report. Your detective work is appreciated and it appears you have found a valid bug.
I believe it should be fine to simply remote that dedup logic as you suggested, but first I will need to double check the spec and run some validation tests on that chage. I did put that dedup check in for a reason, but I forget what that reason was and it may no longer be needed.
Thanks @ashleysommer
I created PR #191, but instead of just removing it, I added a check on the violations list for duplicates. Since the violations list includes the var_dict
, which contains results specific data, it allows for a check on the SPARQL results for duplicates while still returning multiple results for the same target node. Let me know what you think.
Thanks again.
Perfect, thanks
Fixed by #191
For SHACL SPARQL constraints with multiple results for a single target node, pySHACL only returns a single result. For example, consider the data graph:
And the Shape file:
Here, the SPARQL constraint considers a target node (instance of Class1) with a rule that states that every every instance of Class2 related to the Class1 Instance target node must have a relation to a Class3 instance. The data graph shows one instance that passes (instance1Class2) and two that fail (instance2Class2 and instance3Class2). The SPARQL query results in both of these identified and returned. However, pySHACL only reports a single failure.
This occurs because of lines 183-184 in
sparql_based_constraints.py
:Where
t
is the?this
variable in SPARQL and the target node. After the first result is logged, the(t, p, v)
tuple is added todedup_set
. Thus, additional results (although they represent additional, unique failures) are not captured because of the guard statement above. If I comment out lines 183-184, it works as expected and shows two failures.I'm not sure if there is a spec issue that requires the
dedup_set
guard, but if there isn't, I'd recommend removing those lines to allow for multiple results. I'm new to open source contribution, but if there is agreement, I can try to make the changes myself and issue a PR.Thank you for the help!