deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.1k stars 1.87k forks source link

Feature Request: Add `VoyageAI Embedding Models` support to Haystack (v2.0x) #6241

Closed awinml closed 10 months ago

awinml commented 11 months ago

Add VoyageAI Embedding Models support to Haystack (v2.0x)

VoyageAI Embeddings

Embedding models released:

Currently supports two models:

VoyageAI also plans to offer embedding models tailored for coding and finance, with more domains on the horizon:

More information about their embedding models can be found on their Embeddings documentation.

Simple usage example using the Python SDK:

A complete working example to perform Semantic Search uses these embedding models can be found in this Colab Notebook.

Python SDK API Documentation: https://docs.voyageai.com/embeddings/#via-voyage-python-library

import voyageai 
from voyageai import get_embedding

voyageai.api_key = "[ Your Voyage API KEY ]"  # add your Voyage API KEY

documents = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
    "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

# Embed the documents
embeddings = get_embeddings(documents, model="voyage-01")

Describe the solution you'd like

Integrate VoyageAI Embeddings with Haystack (v2.0x), specifically a VoyageTextEmbedder and VoyageDocumentEmbedder using the VoyageAI python SDK.

anakin87 commented 11 months ago

Hello, @awinml! And sorry for the late reply...

It would be a nice integration to have.

If you would like to contribute to this integration, I would propose to:

Feel free to let us know what you think. Thanks!

awinml commented 11 months ago

That sounds good! These models are new and the API access is not fully open yet, so it might take a while for them to become popular.

I will create my own repository for this integration in the meantime and create a PR in haystack-integrations once its ready.

Thanks! @anakin87

awinml commented 10 months ago

@anakin87 I have created a repository (voyage-embedders-haystack) for the integration and also opened a PR (https://github.com/deepset-ai/haystack-integrations/pull/85) in haystack-integrations. Thanks for your help!

anakin87 commented 10 months ago

Great! We will review your PR in haystack-integrations...