Unstructured-IO / unstructured-ingest

Apache License 2.0
20 stars 19 forks source link

feat/persist file id in metadata, use it to delete previous content #219

Closed rbiseck3 closed 1 week ago

rbiseck3 commented 2 weeks ago

Description

Currently the id for each vector entry in pinecone is a randomly generated ID. TO prevent duplicates in the destination index, a step was introduces to persist the record id from the FileData object in the metadata and delete using that during upload to get rid of duplicates.