QuivrHQ / quivr

Open-source RAG Framework for building GenAI Second Brains 🧠 Build productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Efficient retrieval augmented generation framework
https://quivr.com
Other
34.01k stars 3.33k forks source link

[Feature]: Make the embedding model of Ollama configurable #2692

Open travisgu opened 1 week ago

travisgu commented 1 week ago

The Feature

It seems the embedding model parameter is not configured and uses the default value in this chat_routes.py. For OllamaEmbeddings class, the default embedding model is llama2. Please kindly consider to make this parameter configurable so we can choose which embedding model to use.

brain_settings = BrainSettings() supabase_client = get_supabase_client() embeddings = None if brain_settings.ollama_api_base_url: embeddings = OllamaEmbeddings( base_url=brain_settings.ollama_api_base_url ) # pyright: ignore reportPrivateUsage=none else: embeddings = OpenAIEmbeddings() vector_store = CustomSupabaseVectorStore( supabase_client, embeddings, table_name="vectors", user_id=user_id )

Motivation, pitch

When I tried to use Quivr with Ollama deployed on a GPU server, I found this error because I didn't use llama2 model. backend-core | File "/code/modules/brain/service/brain_service.py", line 102, in find_brain_from_question backend-core | list_brains = vector_store.find_brain_closest_query(user.id, question) backend-core | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | File "/code/vectorstore/supabase.py", line 44, in find_brain_closest_query backend-core | vectors = self._embedding.embed_documents([query]) backend-core | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 211, in embed_documents backend-core | embeddings = self._embed(instruction_pairs) backend-core | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in _embed backend-core | return [self._process_embresponse(prompt) for prompt in iter] backend-core | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in backend-core | return [self._process_embresponse(prompt) for prompt in iter] backend-core | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 173, in _process_emb_response backend-core | raise ValueError( backend-core | ValueError: Error raised by inference API HTTP code: 404, {"error":"model 'llama2' not found, try pulling it first"} backend-core | INFO: 172.27.0.1:62894 - "GET /onboarding HTTP/1.1" 200 OK backend-core | INFO: 172.27.0.1:62922 - "GET /chat/56fbab22-67e7-455c-a60b-60c33105edc9/history HTTP/1.1" 200 OK backend-core | INFO: 172.27.0.1:62900 - "GET /user HTTP/1.1" 200 OK

Twitter / LinkedIn details

No response