DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
574 stars 65 forks source link

allow configurable rdflib.Store plugins, e.g., Oxrdflib #240

Closed ceteri closed 2 years ago

ceteri commented 2 years ago

Adding support for a store param to the main KnowledgeGraph class, to allow configurable rdflib.Store plugins, e.g., the most excellent Oxrdflib - kudos @Tpt, @bollwyvl for that project

Based on performance analysis and recommendation by @paoespinozarias

bollwyvl commented 2 years ago

@ceteri vicious nerd snipe! well played.

note that i'm just carrying bits between bleachers on oxigraph, and not really contributing, per se.

this looks like an impressive stack of stuff: we have been tinkering with some of the interactive computing side of this with some stuff in the stable with a few more things in the pipeline.

maybe there's a place for a desktop app riding atop a self-contained environment running all this stuff.... perhaps that would be something other people would be interested in.

ceteri commented 2 years ago

Thanks @bollwyvl !

I hadn't seen jupyrdf, that's great! Tangentially, we'd started toward some related work with jupyterlab-metadata-service – as part of the Rich Context funding effort.

We have a use case in industry where simply swapping the Oxrdflib in lieu of the default rdflib.Store plugin changed some of our critical path queries from ~1100 sec to ~34 sec. We wanted to help others learn about Oxrdflib and will add an example to our tutorial notebooks.

Great to see about GTCOARLab too! Got a few RL use cases waiting in the wings, while we knock out some large distributed graph infra :)

bollwyvl commented 2 years ago

Similar to oxigraph, there is also reasonable for OWL RL 2 reasoning which, for certain workloads, claims about two orders of magnitude improvement over the pure-python incumbent.

We just got this punched into conda-forge, so I haven't had the time to mess with it much for real. As I've (naively) packaged it, it ships its own static copy of oxigraph, so it'll be pretty heavy to have both of them installed. :blush: but even if i could rust/cffi better, it would probably still all have to get marshalled into/out of rust/python a few times.

presumably there is some future (some love-child-of-arrow-and-graphblas) where there would be a low-overhead memory protocol for interop on these things... but there's a lot of low-hanging fruit in the meantime.

ceteri commented 2 years ago

Thank you kindly @bollwyvl -

We have a use case in a large industry application that's currently working with WebProtégé, so there's lots of OWL work. We've noticed cases where the current OWL-RL inference could be less aggressive, i.e., by changing the RL rules and axioms. I've never done that yet. Frankly, the integration with OWL-RL involved lots of reverse engineering and guessing and luck, and I'm reluctant to dig too deep into it – there's not a lot of docs.

Does reasonable provide for customized RL rules and axioms more readily?

Yes, we're watching the Arrow and GraphBLAS spaces for something something low-overhead memory protocol, something something horizontal scale-out :) One of the efforts here on kglab is to have a NumPy-backed RDFlib.Store which could scale-out on Dask Distributed, Legate, etc. FWIW, we have another project (originally based on this project, not cleared for release yet, but we're working on that) which uses Ray actors to manage a large partitioned graph, and runs both W3C stack and openCypher on the same data. I can say that we've reached 1B node graph scale, in memory, using Ray's placement groups (3.5Tb memory space in a cluster)