Closed daniel-dona closed 1 year ago
Hi @daniel-dona I've done some testing, and as expected I found the Python in-memory graph in RDFLib is faster than Oxigraph.
Built in PySHACL Benchmark
RDFLib Memory:
With no inferencing: 0.0028500853500008816 seconds
With rdfs inferencing: 0.01648246115000802 seconds
With owl-rl inferencing: 0.04122918072003813 seconds
With both inferencing: 0.22119805684000313 seconds
Oxigraph:
With no inferencing: 0.005864868260032381 seconds
With rdfs inferencing: 0.01864628946997982 seconds
With owl-rl inferencing: 0.04554426107999461 seconds
With both inferencing: 0.23374986606002493 seconds
Runtime of W3C SHACL Test Suite (SHT):
RDFLib Memory: 2.945 seconds
Oxigraph: 3.344 seconds
Runtime of Datashapes Core Test Suite (DASH):
RDFLib Memory: 1.894 seconds
Oxigraph: 2.287 seconds
Tested using Python 3.10.10, rdflib v6.3.2, oxrdflib v0.3.4, pyoxigraph v0.3.16
These results are consistent with those I have seen before.
Back in 2018 I was deep in the Rust ecosystem, I made a bunch of simple RDF/Triplestore libraries in Rust, and used a very early version of PyO3 to make Python bindings. I wrote lots of tests and benchmarks but found none could perform better as a RDFLib store than the Python Memory graph in RDFLib. The bottleneck is in the transferring of data objects between Rust and Python. RDF operations using RDFLib involve moving a lot of string objects back and forth between the application and the store. In my tests I found the overhead of converting these objects through PyO3 to rust objects, and subsequently converting the rust results back to python objects was greater than any performance benefit gained by a faster store.
Eventually in my testing back then I was able to write an extremely minimal in-memory store backend in Rust, that mimics the way the way the Python Memory store works, optimised it to avoid as may string copies as possible, optimised the PyO3 bindings to reduce object translation and string copying as much as possible, and I managed to get it to perform on par with the Python memory store in RDFLib. So I concluded its simply not worth the effort.
Pyoxigraph is also slower because it uses the RocksDB backend as its storage layer. RocksDB is a modern high performance key-value DB, but it it is not faster than a bare Python Dict in this application. So with oxrdflib you have four layers of indirection between the application and the store:
oxrdflib->pyoxigraph->PyO3 >Oxigraph->RocksDB
Just as a possible speed improvement could be great to use Oxigraph as the backend for RDFlib. This could improve graph traversals but also SPARQL queries used on SHACL.
https://pyoxigraph.readthedocs.io/en/stable/
https://github.com/oxigraph/oxrdflib