Open VladimirAlexiev opened 1 month ago
I agree on the requirement. First of all, we would like to have the UML/information model so that we really only need UML restriction. However, the world is more complicated. To avoid to have very technical UML/information model we will use a logical description of the constraints. This does not really need to be processes as is, but can be converted to relevant execution. This should be the primary motivation for not including SPARQL. Secondary is that we want to have engines that is optimised to execute well known constraints pattern. So our primary test of the SHACL validation engines is to test our SHACL that we are applying rules that are not wrong understanding or bias to a particularly implementation. I agree with the priorities and the argument for picking them. If we should add any addition, I would considered pySHACL. The reason for this is that a lot of TSOs are start using Python for Power Engineers. In addition Nick Car is a core developer. They have also boosted that they have the most complete coverage of the SHACL rules.
I have some suggestion from Erik for benchmarks in SHACL/SPARQL validator.
https://github.com/oxigraph/oxigraph/blob/main/bench/README.md https://github.com/ad-freiburg/qlever/wiki/QLever-performance-evaluation-and-comparison-to-other-SPARQL-engines Oxigraph is now optimised for memory usage (no longer using the rocksdb engine when using in memory) which on Erik's machine is 4 times faster then earlier versions (as this is including unzipping the file, real performance will even be better).
@HarisVranaj but do Oxigraph and QLever have SHACL implementations? Please post links so I can include in https://github.com/VladimirAlexiev/awesome-semantic-shapes#shacl-validators and thereon to https://github.com/w3c-cg/awesome-semantic-shapes
Note to self: https://mail.google.com/mail/u/0/#sent/QgrcJHsTgsbXhdCJwNqzTbwQHVhdRXDHtBB asked Treehouse for access to maplib SHACL.
@Sveino points out that rdf4j 5.0.0 and 5.0.3 have some SHACL improvements:
And more are planned to be completed by 5.0.3 is released
GraphDB will upgrade to rdf4j 5.0 at the end of the year.
When I tried pySHACL back in Jan and tried to package ModShape I has troubles. I was having performance issues. I was in touch with Nick at that time, there might be solutions, but I didn't have time to clean that up.
@HarisVranaj Do Qlever and OxiGraph support SHACL? Please post links to documentation
I'm also working on supporting the last of the SHACL path expressions, and this should be included in RDF4J 5.1.0 or 5.2.0: https://github.com/eclipse-rdf4j/rdf4j/pull/5131
I can also advertise that the RDF4J SHACL implementation supports incremental validation. If you have a large database and want to make a small change to your data, then the RDF4J SHACL engine will analyse your changes and only validate the affected target nodes.
@hmottestad Very good. Incremental or difference validation is extremely relevant since we have a lot of SHACL rules that goes across multiple objects. The full graph is getting very big, and the changes are very limited. We have included the possiblity to exchange differences since 2005 using CIMXML/ RDFXML. We are not looking into how we can use JSON-LD to exchange this. See #53
RDF4J 5 has support for JSON-LD 1.1 with a customised version of Titanium JSON-LD that is considerably faster than the stock implementation that Jena is using.
I saw you were talking about DCAT, is your projected related to Datakatalogen på any chance?
I like fast code :-) The use of DCAT has two purposes. One is providing the header information on the dataset/named graph. We expect the same information to be linked to a Catalog so that the dataset/named graph can be found. So second purpose is to support data catalog (Datakatalog).
Reply from Erik. "They do not support it out of the box, only SPARQL, for them, SHACL needs to be translated into SPARQL this one does https://github.com/DataTreehouse/maplib , they say python, but it's actually written in Rust with a Python API, but can be used as basis to create full app in Rust. " @Sveino can you give access to him.
@HarisVranaj I am not able to give access to DataTreehouse Github, if that is what you wanted.
nono to this repository.
The repo is public, so Erik can post and comment
Requirements:
SPARQLConstraint
and, or, not
sh:dataGraph
so we can validate various stored modelsLet's define which validation engines to try. Here's a proposal:
How about:
After we agree on the list, we need to research and list the limitations of every implementation. This may eliminate some candidates.
@HarisVranaj please attach the presentation you showed 2d ago (I hope it's not confidential). @griddigit-ci and @Sveino please comment on the proposal above, and I'll correct the list