HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
5.83k stars 591 forks source link

Hosted Vector db and KG. Pinecone/Neo4j #114

Open wiltshirek opened 2 days ago

wiltshirek commented 2 days ago

Hi, are you able to share your thoughts or recommendations on how to integrate hosted solutions for the VectorDB (adding support for or replacing Nano with Pinecone) and Graph (adding support for or replacing Networkx with Neo4j). We'd like to plan a production level release. I'm happy to contribute but just looking for some helpful tips before we get started. Don't want to contribute to something thats already in flight or already easily configurable. I didn't see any options for this in the docs but the "Base" classes exist in the code base for adaptation and seamless integration.

rcontesti commented 2 days ago

Plus one to the feature. I would be great to use DB of choice.

LarFii commented 9 hours ago

All our storage implementations are in the storage.py. To be honest, since we’re not very familiar with databases, we haven't yet set up configuration options for swapping databases. Any help you could provide would be greatly appreciated!

wiltshirek commented 9 hours ago

Cool. I'm on it. biggest difference so far is with the node_ids. We are using the entity names as the id's for NetworkX but you can't set the node Ids in Neo4J, and probably quite a few others. I'll keep the concerns separated and account for that internally by leveraging the node_name property, which may arguably be a slightly cleaner approach. thanks.

spo0nman commented 5 hours ago

I'm interested in this as well. Please let me know if you need a hand.

wiltshirek commented 46 minutes ago

Thanks. The KG support is a bit of work since adding additional KG platforms should not be disruptive to the code base. Plus, its using Cypher queries opposed to convenient SDK calls available for NetworkX. I'm making pretty good progress and should be okay to get this out in a week or two depending on my schedule. But the Vector integration is a good part for you to pick up if that makes sense. Maybe start with Pinecone or whatever you are comfortable with or need for a requirement and just leave a framework that is easy to extend with future hosted vector DB providers. Pretty sure other folks will jump in and pick up other Vector integrations from there if its close to plug and play. I'm going with all non-breaking changes to ease adoption of course and following existing code patterns. Excited to connect on this @spo0nman if you are up for it. Would love to hear your thoughts.