[Bug]: I am trying to run `ollama/dolphin-phi` available on ollama, but /chat/{chat_id}/question throws {"error":"model 'llama2' not found, try pulling it first"}

What happened?

I am trying to run ollama/dolphin-phi model on ollama but /chat/{chat_id}/question throws {"error":"model 'llama2' not found, try pulling it first"} error. I don't want to load llama2 on my ollama due to less resources available on my system.

Relevant log output

INFO:     192.168.29.176:50344 - "POST /chat/9ba4c7d0-80a0-4798-a2b0-5749748775c7/question?brain_id=f92b9784-847c-48df-a016-fb0933fc2323 HTTP/1.
1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 91, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 146, in simple_response
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/modules/chat/controller/chat_routes.py", line 171, in create_question_handler
    chat_answer = gpt_answer_generator.generate_answer(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/llm/knowledge_brain_qa.py", line 122, in generate_answer
    model_response = qa(
                     ^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/base.py", line 312, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/base.py", line 306, in __call__
    self._call(inputs, run_manager=run_manager)
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/conversational_retrieval/base.py", line 151, in _call
    docs = self._get_docs(new_question, inputs, run_manager=_run_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/chains/conversational_retrieval/base.py", line 308, in _get_docs
    docs = self.retriever.get_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 211, in get_relevant_documents
    raise e
  File "/usr/local/lib/python3.11/site-packages/langchain_core/retrievers.py", line 204, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 656, in _get_relevant_documents
    docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/vectorstore/supabase.py", line 32, in similarity_search
    vectors = self._embedding.embed_documents([query])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/ollama.py", line 190, in embed_documents
    embeddings = self._embed(instruction_pairs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/ollama.py", line 175, in _embed
    embeddings = self._process_emb_response(prompt)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/ollama.py", line 160, in _process_emb_response
    raise ValueError(
ValueError: Error raised by inference API HTTP code: 404, {"error":"model 'llama2' not found, try pulling it first"}

Twitter / LinkedIn details

No response

🤖

Hello @shubham-dayma! 👋 I'm Dosu, a friendly bot here to help while a human maintainer gets to your issue. I'm great at solving bugs, answering questions, and helping folks like you become contributors.

Consider me a friendly helper while we wait for a human touch. I'm currently diving into the issue you've posted and will have a full response for you shortly. Hang tight! 🚀

ollama pull dolphin-phi my friend ;)

@StanGirard Thanks for looking into this. I already had dolphin-phi in ollama. I tried checking ollama logs and it ended up that, while initializing langchain.embeddings.ollama.OllamaEmbeddings we are not passing param model and by default this is considering model as llama2. I changed this code as below and it worked.

@property def embeddings(self): if self.brain_settings.ollama_api_base_url: return OllamaEmbeddings( base_url=self.brain_settings.ollama_api_baseurl, model=self.model.replace('ollama/', "")_ ) # pyright: ignore reportPrivateUsage=none else: return OpenAIEmbeddings()

Now I am wondering how should I get brain's model in backend/models/settings.py and update the initializing of OllamaEmbeddings here.

🤖

Based on the context provided, it seems like you've already found a solution to the issue by modifying the OllamaEmbeddings initialization in quivr_rag.py. To apply a similar fix in backend/models/settings.py, you would need to have access to the model name in the get_embeddings function.

One way to achieve this could be to modify the BrainSettings class to include the model name, and then use this when initializing OllamaEmbeddings. Here's an example of how you could do this:

class BrainSettings(BaseSettings):
    ...
    model: str = "llama2"  # Add this line

    class Config:
        env_file = ".env"

def get_embeddings():
    settings = BrainSettings()
    if settings.ollama_api_base_url:
        embeddings = OllamaEmbeddings(
            base_url=settings.ollama_api_base_url,
            model=settings.model  # Use the model from settings here
        )
    else:
        embeddings = OpenAIEmbeddings()
    return embeddings

This solution assumes that the OllamaEmbeddings class accepts a model parameter during initialization. If it doesn't, you might need to modify the OllamaEmbeddings class to accept and use this parameter.

Please note that this is a potential solution based on the information provided and the context of the Quivr repository. It might not be the best or the only solution. If you have any further questions or need more help, feel free to ask.

For more information, you can refer to the Quivr repository and the FastAPI documentation on settings management.

Sources

backend/models/settings.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@StanGirard Thanks for looking into this. I already had dolphin-phi in ollama. I tried checking ollama logs and it ended up that, while initializing langchain.embeddings.ollama.OllamaEmbeddings we are not passing param model and by default this is considering model as llama2. I changed this code as below and it worked.

@Property def embeddings(self): if self.brain_settings.ollama_api_base_url: return OllamaEmbeddings( base_url=self.brain_settings.ollama_api_baseurl, model=self.model.replace('ollama/', "")_ ) # pyright: ignore reportPrivateUsage=none else: return OpenAIEmbeddings()

Now I am wondering how should I get brain's model in backend/models/settings.py and update the initializing of OllamaEmbeddings here.

embeddings = OllamaEmbeddings(model="XXX")

make it your own model, like "qwen:7b"、“llama2” and so on, that will be solved

Exactly!

On Sun, Mar 17, 2024 at 21:20:51, Wills < @.*** > wrote:

@ StanGirard ( https://github.com/StanGirard ) Thanks for looking into this. I already had dolphin-phi in ollama. I tried checking ollama logs and it ended up that, while initializing langchain.embeddings.ollama.OllamaEmbeddings we are not passing param model and by default this is considering model as llama2. I changed this code ( https://github.com/StanGirard/quivr/blob/31ce6c05660db8088da96f7bd6f5a0f5f60ffaf7/backend/llm/rags/quivr_rag.py#L67 ) as below and it worked.

@ Property ( https://github.com/Property ) def embeddings(self): if self.brain_settings.ollama_api_base_url: return OllamaEmbeddings( base_url=self.brain_settings.ollama_api_base_url, model=self.model.replace('ollama/', "") ) # pyright: ignore reportPrivateUsage=none else: return OpenAIEmbeddings()

Now I am wondering how should I get brain's model in backend/ models/ settings. py ( http://backend/models/settings.py ) and update the initializing of OllamaEmbeddings here ( https://github.com/StanGirard/quivr/blob/31ce6c05660db8088da96f7bd6f5a0f5f60ffaf7/backend/models/settings.py#L133 ).

embeddings = OllamaEmbeddings(model="XXX")

make it your own model, like "qwen:7b"、“llama2” and so on, that will be solved

— Reply to this email directly, view it on GitHub ( https://github.com/QuivrHQ/quivr/issues/2056#issuecomment-2002901400 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AEVUW3DBZREPV5C7J6UV65LYYZTSHAVCNFSM6AAAAABCFEO5WWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSHEYDCNBQGA ). You are receiving this because you were mentioned. Message ID: <QuivrHQ/quivr/issues/2056/2002901400 @ github. com>

Exactly! … On Sun, Mar 17, 2024 at 21:20:51, Wills < @.** > wrote: > > > @ StanGirard ( https://github.com/StanGirard ) Thanks for looking into > this. I already had dolphin-phi in ollama. I tried checking ollama logs and > it ended up that, while initializing langchain.embeddings.ollama.OllamaEmbeddings > we are not passing param model and by default this is considering model as > llama2. I changed this code ( > https://github.com/StanGirard/quivr/blob/31ce6c05660db8088da96f7bd6f5a0f5f60ffaf7/backend/llm/rags/quivr_rag.py#L67 > ) as below and it worked. > > >> >> >> @ Property ( https://github.com/Property ) >> def embeddings(self): >> if self.brain_settings.ollama_api_base_url: >> return OllamaEmbeddings( >> base_url=self.brain_settings.ollama_api_base_url, >> model=self.model.replace('ollama/', "")* >> ) # pyright: ignore reportPrivateUsage=none >> else: >> return OpenAIEmbeddings() >> >> > > > > Now I am wondering how should I get brain's model in backend/ models/ settings. > py ( http://backend/models/settings.py ) and update the initializing of OllamaEmbeddings > here ( > https://github.com/StanGirard/quivr/blob/31ce6c05660db8088da96f7bd6f5a0f5f60ffaf7/backend/models/settings.py#L133 > ). > > embeddings = OllamaEmbeddings(model="XXX") make it your own model, like "qwen:7b"、“llama2” and so on, that will be solved — Reply to this email directly, view it on GitHub ( #2056 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AEVUW3DBZREPV5C7J6UV65LYYZTSHAVCNFSM6AAAAABCFEO5WWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSHEYDCNBQGA ). You are receiving this because you were mentioned. Message ID: <QuivrHQ/quivr/issues/2056/2002901400 @ github. com>

https://github.com/StanGirard/quivr/blob/31ce6c05660db8088da96f7bd6f5a0f5f60ffaf7/backend/llm/rags/quivr_rag.py#L67

How can I set up a local embedded model if the latest version does not have this file?

come up with the fix #2690

QuivrHQ / quivr