RDFLib / pySHACL

A Python validator for SHACL
Apache License 2.0
241 stars 63 forks source link

The SPARQL Remote DataGraph Feature #233

Open ashleysommer opened 1 month ago

ashleysommer commented 1 month ago

This is something I've been thinking about for a long time, and is finally available for PySHACL.

Enabling sparql_mode allows you to validate against a datagraph on a remote SPARQL endpoint.

To use it on the CLI:

To use it in the library:

In this mode, PySHAL operates strictly in read-only mode, and does not modify the remote data graph. Some features are disabled when using the SPARQL Remote Graph Mode:

This is implemented with the built-in RDFLib sparql-store plugin, but may require the use of the SPARQLWrapper library in the future if we need more features.

There are further options that can be tweaked with Environment Variables:

The major things this mode does differently:

This results in fewer HTTP calls to the SPARQL endpoint and in some cases offloads some workload to the datagraph host.

Note, quantity of SPARQL queries are reduced as much as possible in this first pass, but there are still a lot emitted during a full Validation, I'm still looking to see if there are other ways of further combining SPARQL queries to continue to reduce the number of lookups.

Fixes #174 #226