Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.54k stars 595 forks source link

feat: add VoyageAI embeddings (#3069) #3099

Closed MthwRobinson closed 1 month ago

MthwRobinson commented 1 month ago

Original PR was #3069. Merged in to a feature branch to fix dependency and linting issues. Application code changes from the original PR were already reviewed and approved.


Original PR description: Adding VoyageAI embeddings Voyage AI’s embedding models and rerankers are state-of-the-art in retrieval accuracy.

MthwRobinson commented 1 month ago

@fzowl - Fixed the dependency issue on this branch by pinning langsmith in the contraints file. Does voyage require a pin to that exact version? Should be fine for the time being, but a >= constraint would be better long term in case there's a CVE that requires us to bump versions.

MthwRobinson commented 1 month ago

@fzowl - Queued to merge. Thanks again for the PR!