Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.8k stars 626 forks source link

Fix: Chroma Upsert instead of Add #3086

Closed potter-potter closed 2 months ago

potter-potter commented 2 months ago

Thanks to @0xjgv we have upserting instead of adding in Chroma. This will prevent duplicate embeddings.

Also including a huggingface example. We had examples for all the other embedders.

ryannikolaidis commented 2 months ago

can the huggingface example be a separate PR? since it's entirely unrelated