deepset-ai / haystack-integrations

🚀 A list of Haystack Integrations, maintained by the community or deepset.
48 stars 62 forks source link

Add Mistral integration page #125

Closed annthurium closed 6 months ago

annthurium commented 7 months ago

Now that we have a Mixtral API key, tested these code samples! You can try it yourself in this colab.

I tried to include an example of using the embedding models in a pipeline. However, it looks like the OpenAIDocumentEmbedder doesn't support the Mistral models out of the box, unless I'm missing something. You can see more in the Colab. Do you have recommendations on how to adapt that or should I just leave as is? Thank you.

annthurium commented 7 months ago

@TuanaCelik re the OpenAIGenerator: according to Mistral's docs the only exposed endpoints their platform provides are 1) chat completion 2) creating text embeddings 3) listing available models.

So I assumed we can't use the OpenAIGenerator but I can try anyway if you'd like.

TuanaCelik commented 7 months ago

@annthurium I guess it doesn't really matter then. Sounds like even if we can, using and telling people to use the ChatGenerator would be more correct. But let's ask @anakin87 just in case

anakin87 commented 7 months ago

Mistral API expects messages, so ChatGenerator is the right abstraction in this case. (Generators do not pass messages to the API, so I would expect a failure).

Unrelated: Embedders work as expected (wrt https://github.com/deepset-ai/haystack-integrations/pull/125#issue-2082833272) document_embedder = OpenAIDocumentEmbedder(api_key=userdata.get("MISTRAL_API_KEY"), model='mistral-embed', api_base_url="https://api.mistral.ai/v1")

annthurium commented 7 months ago

@anakin87 oh thanks, I think I wasn't passing in the base URL! (comment edited)

annthurium commented 7 months ago

actually - when I pass in the API_BASE_URL I get an authentication error :-(

from haystack import Document
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import OpenAITextEmbedder, OpenAIDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
             Document(content="I saw a black horse running"),
             Document(content="Germany has many big cities")]

document_embedder = OpenAIDocumentEmbedder(api_key=userdata.get("OPENAI_API_KEY"), model='mistral-embed', api_base_url="https://api.mistral.ai/v1")
documents_with_embeddings = document_embedder.run(documents)['documents']
AuthenticationError                       Traceback (most recent call last)
[<ipython-input-4-d24898994ea8>](https://localhost:8080/#) in <cell line: 14>()
     12 
     13 document_embedder = OpenAIDocumentEmbedder(api_key=userdata.get("OPENAI_API_KEY"), model='mistral-embed', api_base_url="https://api.mistral.ai/v1")
---> 14 documents_with_embeddings = document_embedder.run(documents)['documents']
     15 document_store.write_documents(documents)
     16
    959             log.debug("Re-raising status error")
--> 960             raise self._make_status_error_from_response(err.response) from None
    961 
    962         return self._process_response(

AuthenticationError: Error code: 401 - {'message': 'Unauthorized', 'request_id': '99b1cd757ca42d71e7c7182e8a10f20d'}
anakin87 commented 7 months ago

I think you should pass the MISTRAL_API_KEY

annthurium commented 7 months ago

@anakin87 oh thanks, that was silly of me! Cool, the pipeline example works now!! I'll clean it up and add it in here.