This is a great project, looking forward to trying it out more and experimenting with some workflows!
I'm looking at the portions of the code that utilize networkx for the graph retrieval and/or visualization, and I was thinking, the existing use of Pandas DataFrames makes this workflow very amenable to using Kùzu, an embedded graph database that's very similar to DuckDB and LanceDB in philosophy (to be easy to deploy, and fully open source). Using a graph database with persistence and durability guarantees, rather than an in-memory database like NetworkX, is preferable. And the fact that Kùzu is embedded and open source makes it that much more simple and user-friendly, in the same way that LanceDB and DuckDB are.
It's trivial to read data into a Kùzu graph via Pandas, as described here. The benefit of using Kùzu, imo, over NetworkX, is that Kùzu can scale very well to out-of-memory data, and imo, it's the perfect compliment to LanceDB for users who are familiar with that database from a vector search perspective.
Additionally, it's also trivial to convert a Kùzu graph into a networkx Graph or DiGraph object, which can be used for all downstream workflows that require networkx objects.
The Microsoft GraphRAG repo also uses LanceDB as its default vector store, and the reason Kùzu isn't used there (they also leverage NetworkX for their graph computations) is that at the time of them writing their code, Kùzu wasn't well known enough. I think that's changing, as Kùzu is becoming more and more popular (disclaimer: I work at Kùzu).
I just wanted to create this issue so that this could be something that's on the roadmap, and I'd be happy to try out the framework more and offer my inputs as this project grows. Cheers!
Hi,
This is a great project, looking forward to trying it out more and experimenting with some workflows!
I'm looking at the portions of the code that utilize
networkx
for the graph retrieval and/or visualization, and I was thinking, the existing use of Pandas DataFrames makes this workflow very amenable to using Kùzu, an embedded graph database that's very similar to DuckDB and LanceDB in philosophy (to be easy to deploy, and fully open source). Using a graph database with persistence and durability guarantees, rather than an in-memory database like NetworkX, is preferable. And the fact that Kùzu is embedded and open source makes it that much more simple and user-friendly, in the same way that LanceDB and DuckDB are.It's trivial to read data into a Kùzu graph via Pandas, as described here. The benefit of using Kùzu, imo, over NetworkX, is that Kùzu can scale very well to out-of-memory data, and imo, it's the perfect compliment to LanceDB for users who are familiar with that database from a vector search perspective.
Additionally, it's also trivial to convert a Kùzu graph into a
networkx
Graph or DiGraph object, which can be used for all downstream workflows that requirenetworkx
objects.The Microsoft GraphRAG repo also uses LanceDB as its default vector store, and the reason Kùzu isn't used there (they also leverage NetworkX for their graph computations) is that at the time of them writing their code, Kùzu wasn't well known enough. I think that's changing, as Kùzu is becoming more and more popular (disclaimer: I work at Kùzu).
I just wanted to create this issue so that this could be something that's on the roadmap, and I'd be happy to try out the framework more and offer my inputs as this project grows. Cheers!