This is something I've been thinking about for a long time, and is finally available for PySHACL.
Enabling sparql_mode allows you to validate against a datagraph on a remote SPARQL endpoint.
To use it on the CLI:
use the -q (or --sparql-mode) switch
and supply a HTTP/HTTPS query endpoint string as the "DataGraph" value
To use it in the library:
Enable sparql_mode with the sparql_mode=True argument on validate()
Pass in a HTTP/HTTPS query endpoint string as the data_graph argument.
In this mode, PySHAL operates strictly in read-only mode, and does not modify the remote data graph.Some features are disabled when using the SPARQL Remote Graph Mode:
A working local working copy of the datagraph is not created (it does in regular operation)
"rdfs" and "owl" inferencing is not allowed (because the remote graph is read-only, it cannot be expanded)
Extra Ontology file (Inoculation or Mix-In mode) is disabled (because the remote graph is read-only, and we do not take a local working copy)
SHACL Rules (Advanced mode SPARQL-Rules) are not allowed (because the remote graph is read-only)
All SHACL-JS features are disabled (this is not safe when operating on a remote graph)
"inplace" mode is disabled (this is a technicality, actually all operations on the remote data graph are inherently performed in-place)
This is implemented with the built-in RDFLib sparql-store plugin, but may require the use of the SPARQLWrapper library in the future if we need more features.
There are further options that can be tweaked with Environment Variables:
PYSHACL_SPARQL_USERNAME - HTTP BASIC Username for query endpoint
PYSHACL_SPARQL_PASSWORD - HTTP BASIC Password for query endpoint
PYSHACL_SPARQL_METHOD (default is GET)
The major things this mode does differently:
Searching for Focus nodes in the data graph using the Targeting rules now uses a single SPARQL query (per-constraint) rather than many direct rdflib store operations.
Collecting Value nodes from the data graph from the Focus nodes now uses a single SPARQL query (per constraint) rather than many direct rdflib store operations.
All Constraint evaulations that originally used datagraph lookup operations now use a single SPARQL query rather than many direct rdflib store operations.
This results in fewer HTTP calls to the SPARQL endpoint and in some cases offloads some workload to the datagraph host.
Note, quantity of SPARQL queries are reduced as much as possible in this first pass, but there are still a lot emitted during a full Validation, I'm still looking to see if there are other ways of further combining SPARQL queries to continue to reduce the number of lookups.
This is something I've been thinking about for a long time, and is finally available for PySHACL.
Enabling
sparql_mode
allows you to validate against a datagraph on a remote SPARQL endpoint.To use it on the CLI:
-q
(or--sparql-mode
) switchTo use it in the library:
sparql_mode=True
argument onvalidate()
data_graph
argument.In this mode, PySHAL operates strictly in read-only mode, and does not modify the remote data graph. Some features are disabled when using the SPARQL Remote Graph Mode:
This is implemented with the built-in RDFLib sparql-store plugin, but may require the use of the SPARQLWrapper library in the future if we need more features.
There are further options that can be tweaked with Environment Variables:
PYSHACL_SPARQL_USERNAME
- HTTP BASIC Username for query endpointPYSHACL_SPARQL_PASSWORD
- HTTP BASIC Password for query endpointPYSHACL_SPARQL_METHOD
(default is GET)The major things this mode does differently:
This results in fewer HTTP calls to the SPARQL endpoint and in some cases offloads some workload to the datagraph host.
Note, quantity of SPARQL queries are reduced as much as possible in this first pass, but there are still a lot emitted during a full Validation, I'm still looking to see if there are other ways of further combining SPARQL queries to continue to reduce the number of lookups.
Fixes #174 #226