KonradHoeffner / hdt

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
https://crates.io/crates/hdt
MIT License
19 stars 4 forks source link

Document use within a SPARQL pipeline #34

Open donpellegrino opened 1 year ago

donpellegrino commented 1 year ago

It would be useful to document a few examples where the Rust hdt library is used within a full pipeline, starting from an HDT file (as generated by hdt-cpp) to SPARQL query results.

For example, the Python rdflib-hdt library wraps hdt-cpp and this function point is where the triple pattern query over the HDT is then used by the rdflib SPARQL query processor: (https://github.com/RDFLib/rdflib-hdt/blob/master/rdflib_hdt/hdt_document.py#L114)

Documenting how Rust hdt might provide triple pattern query results to a few separate SPARQL query engines would show users how the Rust hdt library can fit into a broader pipeline from data to SPARQL query results.

KonradHoeffner commented 8 months ago

My current use case for the Rust HDT library is the RickView RDF browser which does not currently require anything beyond the HDT low level triple pattern query features so I don't have a high level SPARQL pipeline available that could be documented. I actually quite like it to use triple patterns without full SPARQL because among other reasons it doesn't risk a high time complexity and overload of a server.

On the other hand, there already is the Rust database Oxigraph, which implements SPARQL, so I'm not sure in which situation it would make sense top implement a full SPARQL pipeline on top of hdt-rs that Oxigraph doesn't cover (though I haven't used Oxigraph yet).

However if you have a use case, I can investigate and document what the best way to generate SPARQL results could be using hdt-rs. One candidate would be to use the HDT Sophia adapter and find out if it's possible to answer SPARQL queries with a Sophia graph.

One such use case could be to have a very lightweight immutable SPARQL endpoint, which I certainly could use for several projects where I currently use Virtuoso but that is much too heavyweight to host a 10 MB knowledge base, however https://github.com/dice-group/tentris (written in C++) seems to be a very promising and light weight development in this direction. It would certainly be interesting to create an alternative SPARQL endpoint based on hdt-rs, but due to my current lack of time I'm not optimistic that this would be more than a prototype.

donpellegrino commented 8 months ago

@KonradHoeffner - Thanks for the background. That makes perfect sense.

I am attempting to integrate Oxigraph and this HDT crate. I have a development branch at https://github.com/DeciSym/oxigraph/tree/1-enable-sparql-query-of-hdt-storage that passes W3C SPARQL 1.0 Basic test cases. Based on that work, the approach of using Oxigraph as a SPARQL front-end to the HDT low level triple pattern query feature appears to be feasible. The development branch needs to be cleaned up and I would like to run it though more of the W3C SPARQL test cases before submitting it as a pull request to Oxigraph.