Setting up a local embedding model served by a llamafile.

Hello,

Is there a way to create an embedding model object (with LiteLLMEmbeddingModel() i guess) from a locally served embedding model.

For more precision I run in parallel :

'Mixtral-7x8-instruct.v0.1.Q4_K_M.llamafile' as LLM and summarization LLM on port 8080
'mxbai-embed-large-v1-f16.llamafile' as embedding model on port 8081

I ever asked how setting up a local embedding model (https://github.com/Future-House/paper-qa/issues/557) but given solution does not seem to have any entry for IP adress and port for embedding model as specified for mail LLM model.

This solution doesn't work : LiteLLMEmbeddingModel.from_name("my-llm-model") because from_name method is not recognized...

Is it a clear and simple way to set up a locally served embedding model ?

To set up a locally served embedding model using LiteLLMEmbeddingModel, you can configure it similarly to how you set up the main LLM model. Although the from_name method isn't recognized, you can directly specify the configuration using the Settings class.

Here's an example of how you might configure it:

from paperqa import Settings, ask

local_embedding_config = dict(
    model_list=[
        dict(
            model_name="my_embedding_model",
            litellm_params=dict(
                model="mxbai-embed-large-v1-f16",
                api_base="http://localhost:8081/v1",
                api_key="sk-no-key-required",
                # Add any other necessary parameters here
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        embedding="my_embedding_model",
        embedding_config=local_embedding_config,
    ),
)

This setup specifies the IP address and port for your local embedding model. Make sure to replace "my_embedding_model" and other parameters with your actual model details. This approach should help you connect to your locally served embedding model ^[1]^[2].

To continue talking to Dosu, mention @dosu.

I'm looking for a solution to create a LiteLLMEmbeddingModel object to pass asdoc().query 'embedding_model' argument. @Dosu.

To create a LiteLLMEmbeddingModel object for use as the embedding_model argument in doc().query, you can use the embedding_model_factory function. Here's how you can do it:

embedding_model = embedding_model_factory("text-embedding-3-small")

This will create a LiteLLMEmbeddingModel with the default name "text-embedding-3-small". You can also pass additional configuration options as keyword arguments if needed ^[1].

To continue talking to Dosu, mention @dosu.

It returns this error :

Request to litellm:
litellm.aembedding('text-embedding-3-small', input=['[INST]Rédigez une synthèse des exemples connus de sépultures mérovingiennes [INST] Répondez en Français.'])

15:05:17 - LiteLLM:WARNING: utils.py:325 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None

POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1 \
-d '{'model': 'text-embedding-3-small', 'input': ['[INST]Rédigez une synthèse des exemples connus de sépultures mérovingiennes [INST] Répondez en Français.']}'

RAW RESPONSE:
Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Provider List: https://docs.litellm.ai/docs/providers

Provider List: https://docs.litellm.ai/docs/providers

Provider List: https://docs.litellm.ai/docs/providers

Provider List: https://docs.litellm.ai/docs/providers

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1110, in aembedding
    headers, response = await self.make_openai_embedding_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1065, in make_openai_embedding_request
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1058, in make_openai_embedding_request
    raw_response = await openai_aclient.embeddings.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_legacy_response.py", line 370, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\resources\embeddings.py", line 237, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1821, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1515, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1616, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\main.py", line 3295, in aembedding
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1154, in aembedding
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\XXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 99, in <module>
    asyncio.run(my_query())
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\XXXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 86, in my_query
    answer = await docs.aquery(
             ^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 665, in aquery
    answer = await self.aget_evidence(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 560, in aget_evidence
    matches = await self.retrieve_texts(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 493, in retrieve_texts
    await self.texts_index.max_marginal_relevance_search(
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 540, in max_marginal_relevance_search
    texts, scores = await self.similarity_search(query, fetch_k, embedding_model)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 596, in similarity_search
    np_query = np.array((await embedding_model.embed_documents([query]))[0])
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 74, in embed_documents
    response = await aembedding(
               ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\utils.py", line 1573, in wrapper_async
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\utils.py", line 1388, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\main.py", line 3304, in aembedding
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2033, in exception_type
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 358, in exception_type
    raise AuthenticationError(
litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

I guess it's because it does not point a local embedding model but builds a model that needs an API key.

Replacing by a sparse model makes it work :

embedding_model = SparseEmbeddingModel(ndim=4096)

but have a poor accuracy...

Have you a solution to point a local embedding model to use it as object ?

@Dosu.

When I set up a as follows :

local_embedding_config = dict(
    model_list=[
        dict(
            model_name=f"openai/my-embedding-model",
            litellm_params=dict(
                model=f"openai/my-embedding-model",
                api_base="http://localhost:8080/v1/",
                api_key="sk-no-key-required",
                # Add any other necessary parameters here
            ),
        )
    ]
)

embedding_model = LiteLLMEmbeddingModel(name=f"openai/my-embedding-model",
                                            config=local_embedding_config,)

I've got this error :


←[92mRequest to litellm:←[0m
←[92mlitellm.aembedding('openai/my-embedding-model', input=['[INST]Rédigez une synthèse des exemples connus de sépultures mérovingiennes [INST] Répondez en Français.'])←[0m

←[92m14:21:38 - LiteLLM:WARNING←[0m: utils.py:325 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
←[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1 \
-d '{'model': 'my-embedding-model', 'input': ['[INST]Rédigez une synthèse des exemples connus de sépultures mérovingiennes [INST] Répondez en Français.']}'
←[0m

RAW RESPONSE:
Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

←[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new←[0m
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

←[1;31mProvider List: https://docs.litellm.ai/docs/providers←[0m

←[1;31mProvider List: https://docs.litellm.ai/docs/providers←[0m

←[1;31mProvider List: https://docs.litellm.ai/docs/providers←[0m

←[1;31mProvider List: https://docs.litellm.ai/docs/providers←[0m

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1110, in aembedding
    headers, response = await self.make_openai_embedding_request(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1065, in make_openai_embedding_request
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1058, in make_openai_embedding_request
    raw_response = await openai_aclient.embeddings.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_legacy_response.py", line 370, in wrapped
    return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\resources\embeddings.py", line 237, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1821, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1515, in request
    return await self._request(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\openai\_base_client.py", line 1616, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\main.py", line 3295, in aembedding
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\llms\OpenAI\openai.py", line 1154, in aembedding
    raise OpenAIError(
litellm.llms.OpenAI.openai.OpenAIError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\XXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 102, in <module>
    my_query()
  File "C:\Users\XXXXXX\Desktop\PaperQA\paper_qa_question_sync.py", line 89, in my_query
    answer = docs.query(
             ^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 624, in query
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 665, in aquery
    answer = await self.aget_evidence(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 560, in aget_evidence
    matches = await self.retrieve_texts(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\docs.py", line 493, in retrieve_texts
    await self.texts_index.max_marginal_relevance_search(
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 540, in max_marginal_relevance_search
    texts, scores = await self.similarity_search(query, fetch_k, embedding_model)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 596, in similarity_search
    np_query = np.array((await embedding_model.embed_documents([query]))[0])
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\paperqa\llms.py", line 74, in embed_documents
    response = await aembedding(
               ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\utils.py", line 1573, in wrapper_async
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\utils.py", line 1388, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\main.py", line 3304, in aembedding
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2033, in exception_type
    raise e
  File "C:\ProgramData\anaconda3\envs\PaperQA_env\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 358, in exception_type
    raise AuthenticationError(
litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

What is the problem with my LiteLLMEmbeddingModel set up ? @Dosu.

The issue you're encountering is due to the LiteLLMEmbeddingModel trying to connect to an external API instead of your local model. To ensure it points to your local embedding model, you need to correctly configure the api_base and ensure that the model name and parameters match your local setup.

Here's how you can set it up:

from paperqa import LiteLLMEmbeddingModel

local_embedding_config = dict(
    model_list=[
        dict(
            model_name="my-embedding-model",
            litellm_params=dict(
                model="my-embedding-model",
                api_base="http://localhost:8081/v1",  # Ensure this matches your local server's address and port
                api_key="sk-no-key-required",  # Use a placeholder if no key is needed
                # Add any other necessary parameters here
            ),
        )
    ]
)

embedding_model = LiteLLMEmbeddingModel(
    name="my-embedding-model",
    config=local_embedding_config,
)

Make sure that the api_base URL is correct and points to the port where your local embedding model is running (in your case, it should be http://localhost:8081/v1). Also, ensure that the model name in litellm_params matches the name used by your local server. This setup should help direct the requests to your local model instead of an external API ^[1].

To continue talking to Dosu, mention @dosu.

I launch my local embedding model as follows :

mxbai-embed-large-v1-f16 --server --nobrowser -cb -np 4 -a my-embedding-model --embedding --host 0.0.0.0 --port 8080 -ngl 999 -c 0

It looks like the LiteLLMEmbeddingModel still tries to connect to an external api. Hox is it possible ? @Dosu.

To ensure that LiteLLMEmbeddingModel connects to your locally hosted embedding model, you need to set the api_base parameter to the local server's address and port where your model is running. Based on your command for launching the model, configure it as follows:

from paperqa import LiteLLMEmbeddingModel

local_embedding_config = dict(
    model_list=[
        dict(
            model_name="my-embedding-model",
            litellm_params=dict(
                model="my-embedding-model",
                api_base="http://localhost:8080/v1",  # Ensure this matches your local server's address and port
                api_key="sk-no-key-required",  # Use a placeholder if no key is needed
                # Add any other necessary parameters here
            ),
        )
    ]
)

embedding_model = LiteLLMEmbeddingModel(
    name="my-embedding-model",
    config=local_embedding_config,
)

Make sure that the api_base URL is set to http://localhost:8080/v1, which matches the port where your local embedding model is running. This setup should direct the requests to your local model instead of an external API ^[1].

To continue talking to Dosu, mention @dosu.

To ensure that LiteLLMEmbeddingModel connects to your locally hosted embedding model, you need to set the api_base parameter to the local server's address and port where your model is running. Based on your command for launching the model, configure it as follows:
from paperqa import LiteLLMEmbeddingModel

local_embedding_config = dict(
    model_list=[
        dict(
            model_name="my-embedding-model",
            litellm_params=dict(
                model="my-embedding-model",
                api_base="http://localhost:8080/v1",  # Ensure this matches your local server's address and port
                api_key="sk-no-key-required",  # Use a placeholder if no key is needed
                # Add any other necessary parameters here
            ),
        )
    ]
)

embedding_model = LiteLLMEmbeddingModel(
    name="my-embedding-model",
    config=local_embedding_config,
)
Make sure that the api_base URL is set to http://localhost:8080/v1, which matches the port where your local embedding model is running. This setup should direct the requests to your local model instead of an external API [1].

To continue talking to Dosu, mention @dosu.

@dosu What is the right syntax to use this solution with an ask function call? For example

answer = ask(
    "Write a wikipedia style article about large language models",
    settings=Settings(
        llm="my_llm_model",
        llm_config=local_llm_config,
        summary_llm="my_llm_model",
        summary_llm_config=local_llm_config,
        #embedding_config=local_embedding_config #Not using this config
        embedding ="nvidia_nim/nvidia/nv-embedqa-e5-v5"
    ),
)

Future-House / paper-qa

Setting up a local embedding model served by a llamafile. #558