DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
579 stars 66 forks source link

directions for RDBMS support in general #108

Closed ceteri closed 2 years ago

ceteri commented 3 years ago

Following up after some work with Trino authors yesterday, there are needs ahead for better metadata modeling based on inference techniques, semantic technologies, etc., for example in Iceberg connectors.

Just found this about Morph-RDB :

Probably a good thing to have a spike toward: how to integrate with Trino and Morph-RDB. Will check with @dachafra, et al.

Could be good to discuss with @dvsrepo and Asun about this, too? It could become an integration point for Recognai?

dachafra commented 3 years ago

@ceteri we are currently developing a new python-based engine that could fit better than Morph-RDB (java-based), and it is also able to parse RML and R2RML mappings. It's currently under review so it's not public yet, but it will be soon. @ArenasGuerreroJulian and I would be really happy to help in the integration.

arenas-guerrero-julian commented 3 years ago

Hi @ceteri we have released Morph-KGC if it is useful

ceteri commented 2 years ago

Hi @ArenasGuerreroJulian @dachafra thank you, this is great to see! We're beginning integration of morph-kgc with kglab now, and have a potential large use case in industry at BASF. I'm working with @paoespinozarias @neobernad @jelisf @jmueller5 on the integration. There may be ways we can collaborate on parallelization, e.g., with Ray, Dask, etc. ?

Mec-iS commented 2 years ago

happy to help when implementation guidelines are available

arenas-guerrero-julian commented 2 years ago

Hi @ceteri awesome! morph-kgc already parallelizes using multiprocessing library. I am exploring further parallelization with Dask, happy to collaborate

ceteri commented 2 years ago

@ArenasGuerreroJulian that's excellent. We're using Ray in the production environment to scale out graphs, and in Ray there's a drop-in replacement for the standard multiprocessing library: https://github.com/DerwenAI/ray_tutorial/blob/main/ex_04_mult_pool.ipynb

ceteri commented 2 years ago

@Mec-iS how about this approach:

For testing, we could use a CSV file as the input.

Does that seem like a good approach for integration?

ceteri commented 2 years ago

Many thanks @Mec-iS @ArenasGuerreroJulian @dachafra !