mkhludnev commented 2 weeks ago

Description

Ollama embeddings should be properly configured via these props. Now only base_url is passed to OllamaEmbeddings. It causes the following issues:

it tries to pick default llama2, but users might not have it installed like #2056 #2595
if they have llama2 it yields unreasonably large 4k vectors (compare with 1.5k OpeanAI's)
these embeds somewhat ok, but might not be what users really need. see https://github.com/ollama/ollama/issues/1624#issuecomment-1879695024 https://github.com/ggerganov/llama.cpp/issues/899
it also passed default query and document instructions, which might be redundant.

This change let users to configure embeddings models which is hosted in Ollama.

Checklist before requesting a review

Please delete options that are not relevant.

[v] My code follows the style guidelines of this project
[v] I have performed a self-review of my code
[-] I have commented hard-to-understand areas
[-] I have ideally added tests that prove my fix is effective or that my feature works
[-] New and existing unit tests pass locally with my changes
[-] Any dependent changes have been merged

Screenshots (if appropriate):

vercel[bot] commented 2 weeks ago

Someone is attempting to deploy a commit to the Quivr-app Team on Vercel.

A member of the Team first needs to authorize it.

StanGirard commented 2 weeks ago

Thanks a lot for this PR! Being on holiday (still looking at PR) @AmineDiro will review the PR ;)

filipe-omnix commented 1 week ago

Hi could you please provide an example of changes to the env ? (OLLAMAEMBEDDINGS* part please). Thanks in advance!

mkhludnev commented 1 week ago

example of changes to the env ? (OLLAMAEMBEDDINGS* part please).

fair! Here's my props

OLLAMA_EMBEDDINGS_MODEL=chatfire/bge-m3:q8_0 # just because we deployed this embeddings model, choose yours
OLLAMA_EMBEDDINGS_DOC_INSTRUCT=              # just because there are certain values by default https://github.com/langchain-ai/langchain/blob/c314222796798545f168f6ff7e750eb24e8edd51/libs/community/langchain_community/embeddings/ollama.py#L40
OLLAMA_EMBEDDINGS_QUERY_INSTRUCT=            # but instructions are not necessary for bge-m3 see faq#2 https://huggingface.co/BAAI/bge-m3#faq

mkhludnev commented 2 days ago

@filipe-omnix can you confirm if this patch is useful for you?

andyzhangwp commented 2 days ago

@mkhludnev

I applied the above update, but still encountered an error during local testing: {"error": "model 'llama2' not found, try pulling it first"}

The following are debugging logs:

1、get_embeddings of models/setting.py , mode is llama3: DEBUG:httpcore.http11:response_closed.complete backend-core | ======get_embeddings=====embeddings=[base_url='http://33a45d4e.r11.cpolar.top' model='llamafamily/llama3-chinese-8b-instruct' embed_instruction='passage:' query_instruction='query:' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None show_progress=False headers=None model_kwargs=None]

Here you can see that the model is llama3, indicating that the configuration is valid.

2、similarity_search of vectorstore/supabase.py, : ====111======similarity_search=====self._embedding=[base_url='http://33a45d4e.r11.cpolar.top' model='llama2' embed_instruction='passage: ' query_instruction='query: ' mirostat=None mirostat_eta=None mirostat_tau=None num_ctx=None num_gpu=None num_thread=None repeat_last_n=None repeat_penalty=None temperature=None stop=None tfs_z=None top_k=None top_p=None show_progress=False headers=None model_kwargs=None]

The model here has been changed to llama2 again, and the previous embeddings have not been used

3、error log :

backend-core | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | | File "/code/vectorstore/supabase.py", line 76, in similarity_search backend-core | | vectors = self._embedding.embed_documents([query]) backend-core | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 211, in embed_documents backend-core | | embeddings = self._embed(instruction_pairs) backend-core | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in _embed backend-core | | return [self._process_embresponse(prompt) for prompt in iter] backend-core | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in backend-core | | return [self._process_embresponse(prompt) for prompt in iter] backend-core | | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ backend-core | | File "/usr/local/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 173, in _process_emb_response backend-core | | raise ValueError( backend-core | | ValueError: Error raised by inference API HTTP code: 404, {"error":"model 'llama2' not found, try pulling it first"}

QuivrHQ / quivr

feat: introducing Ollama embeddings properties. #2690

Description

Checklist before requesting a review

Screenshots (if appropriate):