Closed joegoldin closed 6 months ago
(force push was just a commit message rename)
This is looking good. Thanks for doing this!
One minor change that will be needed is to add the chunk_index
and chunk_header
keys to the metadata dictionary, as these are used downstream in RSE. You could also just upload and return the metadata dictionary in its entirety, like it's done in BasicVectorDB and like you do in the Cassandra PR.
One minor change that will be needed is to add the
chunk_index
andchunk_header
keys to the metadata dictionary
Oops, totally missed that here -- should be good now, I think!
One more thing: it looks like you're generating UUIDs off of just the doc_id
. Since a single document will generally have multiple chunks, you'll need to also use the chunk_index
(which you can assume will be contained in the metadata dictionary) in combination with the doc_id
to uniquely identify a chunk. This will also affect how the remove_document
method will need to work. You'll need to identify all items associated with a given doc_id
and delete each of those.
You'll need to identify all items associated with a given
doc_id
and delete each of those.
🤦♂️ Yeeep, totally missed that too. I've updated the logic and tests, hopefully it's completely functional now?
Had to add the to_dict
method to enable proper saving and loading. I also added an integration test, and fixed a couple little bugs. Should be good to go now.
Started working on a Weaviate VectorDB class. I believe it should have all basic functionality but I've only tested it against Embedded Weaviate so far.
resolves https://github.com/SuperpoweredAI/spRAG/issues/3