D-Star-AI / dsRAG

High-performance retrieval engine for unstructured data
MIT License
853 stars 61 forks source link

feat: weaviate vector db client #7

Closed joegoldin closed 6 months ago

joegoldin commented 6 months ago

Started working on a Weaviate VectorDB class. I believe it should have all basic functionality but I've only tested it against Embedded Weaviate so far.

resolves https://github.com/SuperpoweredAI/spRAG/issues/3

joegoldin commented 6 months ago

(force push was just a commit message rename)

zmccormick7 commented 6 months ago

This is looking good. Thanks for doing this!

One minor change that will be needed is to add the chunk_index and chunk_header keys to the metadata dictionary, as these are used downstream in RSE. You could also just upload and return the metadata dictionary in its entirety, like it's done in BasicVectorDB and like you do in the Cassandra PR.

joegoldin commented 6 months ago

One minor change that will be needed is to add the chunk_index and chunk_header keys to the metadata dictionary

Oops, totally missed that here -- should be good now, I think!

zmccormick7 commented 6 months ago

One more thing: it looks like you're generating UUIDs off of just the doc_id. Since a single document will generally have multiple chunks, you'll need to also use the chunk_index (which you can assume will be contained in the metadata dictionary) in combination with the doc_id to uniquely identify a chunk. This will also affect how the remove_document method will need to work. You'll need to identify all items associated with a given doc_id and delete each of those.

joegoldin commented 6 months ago

You'll need to identify all items associated with a given doc_id and delete each of those.

🤦‍♂️ Yeeep, totally missed that too. I've updated the logic and tests, hopefully it's completely functional now?

zmccormick7 commented 6 months ago

Had to add the to_dict method to enable proper saving and loading. I also added an integration test, and fixed a couple little bugs. Should be good to go now.