Unstructured-IO / unstructured-ingest

Apache License 2.0
20 stars 19 forks source link

feat/use deterministic id for pinecone uploader #220

Open rbiseck3 opened 2 weeks ago

rbiseck3 commented 2 weeks ago

Description

To avoid duplicate content when the same record is processed in pinecone, the id of each element is now a uuid generated from the combination of the element id and the file data identifier.

Safer approach than similar PR: PR219, because it doesn't involve deleting any content. The element id is now more deterministic than it was in the past, which is why the random id was originally introduces. Now, each element in a document should have a unique and deterministic id.