Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.23k stars 586 forks source link

paperqa.Settings does not use local llm #582

Open loilisxka opened 1 week ago

loilisxka commented 1 week ago

I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,

from paperqa import Settings, ask

local_llm_config = dict(
    model_list=[
        dict(
            model_name="Qwen2.5-14B-Instruct",
            litellm_params=dict(
                model="Qwen2.5-14B-Instruct",
                api_base="http://127.0.0.1:8000/v1",
                api_key="EMPTY",
                temperature=0.7,
                max_tokens=512,
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        llm="Qwen2.5-14B-Instruct",
        llm_config=local_llm_config,
        summary_llm="Qwen2.5-14B-Instruct",
        summary_llm_config=local_llm_config,
    ),
)

I got the following error:

Traceback (most recent call last):
  File "/home/huzheyu/project/paper/test/qa.py", line 22, in <module>
    answer = ask(
             ^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/__init__.py", line 102, in ask
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 67, in agent_query
    response = await run_agent(docs, query, agent_type, **runner_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 119, in run_agent
    elif tool_selector_or_none := query.settings.make_aviary_tool_selector(agent_type):
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/settings.py", line 719, in make_aviary_tool_selector
    acompletion=self.get_agent_llm().router.acompletion,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/llms.py", line 610, in router
    self._router = litellm.Router(
                   ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 352, in __init__
    self.set_model_list(model_list)
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4069, in set_model_list
    self._create_deployment(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 3994, in _create_deployment
    deployment = self._add_deployment(deployment=deployment)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4158, in _add_deployment
    set_client(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router_utils/client_initalization_utils.py", line 441, in set_client
    _client = openai.AsyncOpenAI(  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/openai/_client.py", line 319, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.

dosubot[bot] commented 1 week ago

Hey, @loilisxka! I'm here to help you with any coding bugs or questions you have. Let's work together to solve this issue!

It looks like the issue is related to the OPENAI_API_KEY environment variable. Even when using a local LLM, the system might still require this variable to be set due to dependencies on libraries like langchain or litellm.

A similar issue was discussed in a closed thread where the solution was to set the OPENAI_API_KEY to a generic value. You can try setting it like this:

export OPENAI_API_KEY=foo

This should help bypass the requirement for an actual OpenAI API key and allow your local LLM configuration to be used [1].

To continue talking to Dosu, mention @dosu.

loilisxka commented 1 week ago

Hey, @dosu I skipped the api check by adding the environment variable os.environ['OPENAI_API_KEY'] = 'foo', but then the program started repeatedly printing

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

What does this mean? Here is the full error

Timeout: litellm.Timeout: APITimeoutError - Request timed out.                                                                                                                                                          
           error_str: Request timed out.                                                                                                                                                                                           
           Received Model Group=gpt-4o-2024-08-06                                                                                                                                                                                  
           Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3 

It seems that the program is still looking for online llm instead of local.

dosubot[bot] commented 1 week ago

The error message indicates that litellm is timing out while trying to connect to the specified model. This suggests that the configuration might still be pointing to an online model instead of your local setup.

Here are a few things to check:

  1. API Base URL: Ensure that the api_base in your litellm_params is correctly set to your local server's address (http://127.0.0.1:8000/v1).

  2. Model Name: Double-check that the model_name in your model_list matches exactly with what your local server expects.

  3. Network Issues: Verify that your local server is running and accessible at the specified api_base URL. You can test this by sending a simple request to the server to see if it responds.

  4. Verbose Logging: Enable verbose logging by setting litellm.set_verbose = True to get more detailed information about what might be going wrong.

If everything seems correct and the issue persists, it might be worth checking the server logs for any errors or misconfigurations.

To continue talking to Dosu, mention @dosu.

loilisxka commented 1 week ago

I am pretty sure that there is nothing wrong with the locally deployed LLM. It is providing correct and timely answers when I access it using the Openai package, but paper-QA is not working properly.

chemical-accuracy commented 1 week ago

I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:

           Received Model Group=gpt-4o-2024-08-06                                   
           Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM   
           Max Retries: 3
victorconka commented 6 days ago

I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:

           Received Model Group=gpt-4o-2024-08-06                                   
           Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM   
           Max Retries: 3

Same problem here, trying to run paperqa locally for the last month and it is pulling gpt-4o-2024-xx-xx model instead of local LLM. I tryed running llama3.1, llama3.2 and I have no problem executing the example from ollama to add documents to storage and ask questions. Paper-qa seems to connect to ollama server, gpu has the corresponding process, but the error is the one above. Could someone who has a working example, provide their virtual environment requirement list and IDE description, whether it is a notebook, a plain py or whatever.

chemical-accuracy commented 6 days ago

@grg-ffb - maybe you can share your setup? I saw that you edited the README file to add ollama in this fork: https://github.com/grg-ffb/paper-qa-ollama

grg-ffb commented 6 days ago

@grg-ffb - maybe you can share your setup? I saw that you edited the README file to add ollama in this fork: https://github.com/grg-ffb/paper-qa-ollama

Hi @chemical-accuracy, my code is already merged to paperqa with PR #536 last week. But to add on it: I am hosting llama3.2 and mxbai-embed-large on ollama without any changes to default port or anything. For me also not every api did work. Sometimes I still get the error that openai keys are missing or rate limits are reached. So it seems that for some tasks paperqa still tries to use openai. But as a starting point it should work.

grg-ffb commented 6 days ago

I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,

from paperqa import Settings, ask

local_llm_config = dict(
    model_list=[
        dict(
            model_name="Qwen2.5-14B-Instruct",
            litellm_params=dict(
                model="Qwen2.5-14B-Instruct",
                api_base="http://127.0.0.1:8000/v1",
                api_key="EMPTY",
                temperature=0.7,
                max_tokens=512,
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        llm="Qwen2.5-14B-Instruct",
        llm_config=local_llm_config,
        summary_llm="Qwen2.5-14B-Instruct",
        summary_llm_config=local_llm_config,
    ),
)

I got the following error:

Traceback (most recent call last):
  File "/home/huzheyu/project/paper/test/qa.py", line 22, in <module>
    answer = ask(
             ^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/__init__.py", line 102, in ask
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 67, in agent_query
    response = await run_agent(docs, query, agent_type, **runner_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 119, in run_agent
    elif tool_selector_or_none := query.settings.make_aviary_tool_selector(agent_type):
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/settings.py", line 719, in make_aviary_tool_selector
    acompletion=self.get_agent_llm().router.acompletion,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/llms.py", line 610, in router
    self._router = litellm.Router(
                   ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 352, in __init__
    self.set_model_list(model_list)
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4069, in set_model_list
    self._create_deployment(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 3994, in _create_deployment
    deployment = self._add_deployment(deployment=deployment)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4158, in _add_deployment
    set_client(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router_utils/client_initalization_utils.py", line 441, in set_client
    _client = openai.AsyncOpenAI(  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/openai/_client.py", line 319, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.

From my experience you also need to add the embedding model. If you don't, paperqa will still try to access openai for creating the embeddings and you will get this error if the openai api can't be reached.

victorconka commented 6 days ago

So I did a clean install, conda environment with python 3.11 and pip install paper-qa==5.2.0, just to see if it is any different from conda isntall Execution in vscode jupyter:

import nest_asyncio
nest_asyncio.apply()
from paperqa import Settings, ask
import os

# set up openai key to stop openai lib from complaining 
api_key="sk-no-key-required"
os.environ['OPENAI_API_KEY'] = api_key

# question list
question_list = [
    "test q1?",
    "test q2?",
    "test q3?",
]

model = "ollama/llama3.2" # "ollama/llama3.1" 
embedding = "ollama/mxbai-embed-large" # "nomic-embed-text"

local_llm_config = {
    "model_list": [
        {
            "model_name": model,
            "litellm_params": {
                "model": model,
                "api_base": "http://localhost:11434",
            },
        }
    ]
}

answer = ask(
    question_list[0],
    settings=Settings(
        llm=model,
        llm_config=local_llm_config,
        summary_llm=model,
        summary_llm_config=local_llm_config,
        embedding=embedding,
    ),
)

Error that I get is:


           ╭───────────────────────────────── Traceback (most recent call last) ──────────────────────────────────╮
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/paperqa/agents/main.py:240 in              │
           │ run_aviary_agent                                                                                     │
           │                                                                                                      │
           │   237 │   │   │   │   │   )                                                                          │
           │   238 │   │   │   │   │   return env.state.answer, AgentStatus.TRUNCATED                             │
           │   239 │   │   │   │   agent_state.messages += obs                                                    │
           │ ❱ 240 │   │   │   │   for attempt in Retrying(                                                       │
           │   241 │   │   │   │   │   stop=stop_after_attempt(5),                                                │
           │   242 │   │   │   │   │   retry=retry_if_exception_type(MalformedMessageError),                      │
           │   243 │   │   │   │   │   before_sleep=before_sleep_log(logger, logging.WARNING),                    │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:443 in __iter__       │
           │                                                                                                      │
           │   440 │   │                                                                                          │
           │   441 │   │   retry_state = RetryCallState(self, fn=None, args=(), kwargs={})                        │
           │   442 │   │   while True:                                                                            │
           │ ❱ 443 │   │   │   do = self.iter(retry_state=retry_state)                                            │
           │   444 │   │   │   if isinstance(do, DoAttempt):                                                      │
           │   445 │   │   │   │   yield AttemptManager(retry_state=retry_state)                                  │
           │   446 │   │   │   elif isinstance(do, DoSleep):                                                      │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:376 in iter           │
           │                                                                                                      │
           │   373 │   │   self._begin_iter(retry_state)                                                          │
           │   374 │   │   result = None                                                                          │
           │   375 │   │   for action in self.iter_state.actions:                                                 │
           │ ❱ 376 │   │   │   result = action(retry_state)                                                       │
           │   377 │   │   return result                                                                          │
           │   378 │                                                                                              │
           │   379 │   def _begin_iter(self, retry_state: "RetryCallState") -> None:  # noqa                      │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:398 in <lambda>       │
           │                                                                                                      │
           │   395 │                                                                                              │
           │   396 │   def _post_retry_check_actions(self, retry_state: "RetryCallState") -> None:                │
           │   397 │   │   if not (self.iter_state.is_explicit_retry or self.iter_state.retry_run_result):        │
           │ ❱ 398 │   │   │   self._add_action_func(lambda rs: rs.outcome.result())                              │
           │   399 │   │   │   return                                                                             │
           │   400 │   │                                                                                          │
           │   401 │   │   if self.after is not None:                                                             │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/concurrent/futures/_base.py:449 in result                │
           │                                                                                                      │
           │   446 │   │   │   │   if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:                         │
           │   447 │   │   │   │   │   raise CancelledError()                                                     │
           │   448 │   │   │   │   elif self._state == FINISHED:                                                  │
           │ ❱ 449 │   │   │   │   │   return self.__get_result()                                                 │
           │   450 │   │   │   │                                                                                  │
           │   451 │   │   │   │   self._condition.wait(timeout)                                                  │
           │   452                                                                                                │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/concurrent/futures/_base.py:401 in __get_result          │
           │                                                                                                      │
           │   398 │   def __get_result(self):                                                                    │
           │   399 │   │   if self._exception:                                                                    │
           │   400 │   │   │   try:                                                                               │
           │ ❱ 401 │   │   │   │   raise self._exception                                                          │
           │   402 │   │   │   finally:                                                                           │
           │   403 │   │   │   │   # Break a reference cycle with the exception in self._exception                │
           │   404 │   │   │   │   self = None                                                                    │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/paperqa/agents/main.py:247 in              │
           │ run_aviary_agent                                                                                     │
           │                                                                                                      │
           │   244 │   │   │   │   │   reraise=True,                                                              │
           │   245 │   │   │   │   ):                                                                             │
           │   246 │   │   │   │   │   with attempt:  # Retrying if ToolSelector fails to select a tool           │
           │ ❱ 247 │   │   │   │   │   │   action = await agent(agent_state.messages, tools)                      │
           │   248 │   │   │   │   agent_state.messages = [*agent_state.messages, action]                         │
           │   249 │   │   │   │   if on_agent_action_callback:                                                   │
           │   250 │   │   │   │   │   await on_agent_action_callback(action, agent_state)                        │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/aviary/tools/utils.py:175 in __call__      │
           │                                                                                                      │
           │   172 │   │   │   │   # in practice 'tool_calls' shows up too                                        │
           │   173 │   │   │   │   expected_finish_reason.add("stop")                                             │
           │   174 │   │                                                                                          │
           │ ❱ 175 │   │   model_response = await self._bound_acompletion(                                        │
           │   176 │   │   │   messages=MessagesAdapter.dump_python(                                              │
           │   177 │   │   │   │   messages, exclude_none=True, by_alias=True                                     │
           │   178 │   │   │   ),                                                                                 │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:740 in acompletion       │
           │                                                                                                      │
           │    737 │   │   │   │   │   original_exception=e,                                                     │
           │    738 │   │   │   │   )                                                                             │
           │    739 │   │   │   )                                                                                 │
           │ ❱  740 │   │   │   raise e                                                                           │
           │    741 │                                                                                             │
           │    742 │   async def _acompletion(                                                                   │
           │    743 │   │   self, model: str, messages: List[Dict[str, str]], **kwargs                            │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:728 in acompletion       │
           │                                                                                                      │
           │    725 │   │   │   if request_priority is not None and isinstance(request_priority, int):            │
           │    726 │   │   │   │   response = await self.schedule_acompletion(**kwargs)                          │
           │    727 │   │   │   else:                                                                             │
           │ ❱  728 │   │   │   │   response = await self.async_function_with_fallbacks(**kwargs)                 │
           │    729 │   │   │                                                                                     │
           │    730 │   │   │   return response                                                                   │
           │    731 │   │   except Exception as e:                                                                │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3044 in                  │
           │ async_function_with_fallbacks                                                                        │
           │                                                                                                      │
           │   3041 │   │   │   │   │   │   )                                                                     │
           │   3042 │   │   │   │   │   )                                                                         │
           │   3043 │   │   │                                                                                     │
           │ ❱ 3044 │   │   │   raise original_exception                                                          │
           │   3045 │                                                                                             │
           │   3046 │   async def async_function_with_retries(self, *args, **kwargs):                             │
           │   3047 │   │   verbose_router_logger.debug(                                                          │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:2898 in                  │
           │ async_function_with_fallbacks                                                                        │
           │                                                                                                      │
           │   2895 │   │   │   │   │   │   Context_Policy_Fallbacks={content_policy_fallbacks}",                 │
           │   2896 │   │   │   │   )                                                                             │
           │   2897 │   │   │                                                                                     │
           │ ❱ 2898 │   │   │   response = await self.async_function_with_retries(*args, **kwargs)                │
           │   2899 │   │   │   verbose_router_logger.debug(f"Async Response: {response}")                        │
           │   2900 │   │   │   return response                                                                   │
           │   2901 │   │   except Exception as e:                                                                │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3105 in                  │
           │ async_function_with_retries                                                                          │
           │                                                                                                      │
           │   3102 │   │   │   )                                                                                 │
           │   3103 │   │   │                                                                                     │
           │   3104 │   │   │   # raises an exception if this error should not be retries                         │
           │ ❱ 3105 │   │   │   self.should_retry_this_error(                                                     │
           │   3106 │   │   │   │   error=e,                                                                      │
           │   3107 │   │   │   │   healthy_deployments=_healthy_deployments,                                     │
           │   3108 │   │   │   │   all_deployments=_all_deployments,                                             │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3243 in                  │
           │ should_retry_this_error                                                                              │
           │                                                                                                      │
           │   3240 │   │   │   if (                                                                              │
           │   3241 │   │   │   │   _num_all_deployments <= 1                                                     │
           │   3242 │   │   │   ):  # if there is only 1 deployment for this model group then don't retry         │
           │ ❱ 3243 │   │   │   │   raise error  # then raise error                                               │
           │   3244 │   │                                                                                         │
           │   3245 │   │   # Do not retry if there are no healthy deployments                                    │
           │   3246 │   │   # just raise the error                                                                │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3088 in                  │
           │ async_function_with_retries                                                                          │
           │                                                                                                      │
           │   3085 │   │   │   │   │   message=f"This is a mock exception for model={model_group}, to            │
           │        trigger a rate limit error.",                                                                 │
           │   3086 │   │   │   │   )                                                                             │
           │   3087 │   │   │   # if the function call is successful, no exception will be raised and we'll       │
           │        break out of the loop                                                                         │
           │ ❱ 3088 │   │   │   response = await self.make_call(original_function, *args, **kwargs)               │
           │   3089 │   │   │                                                                                     │
           │   3090 │   │   │   return response                                                                   │
           │   3091 │   │   except Exception as e:                                                                │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3181 in make_call        │
           │                                                                                                      │
           │   3178 │   │   Handler for making a call to the .completion()/.embeddings() functions.               │
           │   3179 │   │   """                                                                                   │
           │   3180 │   │   model_group = kwargs.get("model")                                                     │
           │ ❱ 3181 │   │   response = await original_function(*args, **kwargs)                                   │
           │   3182 │   │   ## PROCESS RESPONSE HEADERS                                                           │
           │   3183 │   │   await self.set_response_headers(response=response, model_group=model_group)           │
           │   3184                                                                                               │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:879 in _acompletion      │
           │                                                                                                      │
           │    876 │   │   │   )                                                                                 │
           │    877 │   │   │   if model_name is not None:                                                        │
           │    878 │   │   │   │   self.fail_calls[model_name] += 1                                              │
           │ ❱  879 │   │   │   raise e                                                                           │
           │    880 │                                                                                             │
           │    881 │   async def abatch_completion(                                                              │
           │    882 │   │   self,                                                                                 │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:851 in _acompletion      │
           │                                                                                                      │
           │    848 │   │   │   │   await self.async_routing_strategy_pre_call_checks(                            │
           │    849 │   │   │   │   │   deployment=deployment, logging_obj=logging_obj                            │
           │    850 │   │   │   │   )                                                                             │
           │ ❱  851 │   │   │   │   response = await _response                                                    │
           │    852 │   │   │                                                                                     │
           │    853 │   │   │   ## CHECK CONTENT FILTER ERROR ##                                                  │
           │    854 │   │   │   if isinstance(response, ModelResponse):                                           │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/utils.py:1573 in wrapper_async     │
           │                                                                                                      │
           │   1570 │   │   │   │   │   else:                                                                     │
           │   1571 │   │   │   │   │   │   kwargs["model"] = context_window_fallback_dict[model]                 │
           │   1572 │   │   │   │   │   return await original_function(*args, **kwargs)                           │
           │ ❱ 1573 │   │   │   raise e                                                                           │
           │   1574 │                                                                                             │
           │   1575 │   is_coroutine = inspect.iscoroutinefunction(original_function)                             │
           │   1576                                                                                               │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/utils.py:1388 in wrapper_async     │
           │                                                                                                      │
           │   1385 │   │   │   │   │   │   │   ).start()                                                         │
           │   1386 │   │   │   │   │   │   │   return final_embedding_cached_response                            │
           │   1387 │   │   │   # MODEL CALL                                                                      │
           │ ❱ 1388 │   │   │   result = await original_function(*args, **kwargs)                                 │
           │   1389 │   │   │   end_time = datetime.datetime.now()                                                │
           │   1390 │   │   │   if "stream" in kwargs and kwargs["stream"] is True:                               │
           │   1391 │   │   │   │   if (                                                                          │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/main.py:457 in acompletion         │
           │                                                                                                      │
           │    454 │   │   return response                                                                       │
           │    455 │   except Exception as e:                                                                    │
           │    456 │   │   custom_llm_provider = custom_llm_provider or "openai"                                 │
           │ ❱  457 │   │   raise exception_type(                                                                 │
           │    458 │   │   │   model=model,                                                                      │
           │    459 │   │   │   custom_llm_provider=custom_llm_provider,                                          │
           │    460 │   │   │   original_exception=e,                                                             │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mappi │
           │ ng_utils.py:2033 in exception_type                                                                   │
           │                                                                                                      │
           │   2030 │   │   # don't let an error with mapping interrupt the user from receiving an error          │
           │        from the llm api calls                                                                        │
           │   2031 │   │   if exception_mapping_worked:                                                          │
           │   2032 │   │   │   setattr(e, "litellm_response_headers", litellm_response_headers)                  │
           │ ❱ 2033 │   │   │   raise e                                                                           │
           │   2034 │   │   else:                                                                                 │
           │   2035 │   │   │   for error_type in litellm.LITELLM_EXCEPTION_TYPES:                                │
           │   2036 │   │   │   │   if isinstance(e, error_type):                                                 │
           │                                                                                                      │
           │ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mappi │
           │ ng_utils.py:358 in exception_type                                                                    │
           │                                                                                                      │
           │    355 │   │   │   │   │   │   )                                                                     │
           │    356 │   │   │   │   │   elif original_exception.status_code == 401:                               │
           │    357 │   │   │   │   │   │   exception_mapping_worked = True                                       │
           │ ❱  358 │   │   │   │   │   │   raise AuthenticationError(                                            │
           │    359 │   │   │   │   │   │   │   message=f"AuthenticationError: {exception_provider} -             │
           │        {message}",                                                                                   │
           │    360 │   │   │   │   │   │   │   llm_provider=custom_llm_provider,                                 │
           │    361 │   │   │   │   │   │   │   model=model,                                                      │
           ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
           AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401
           - {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at   
           https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 
           'invalid_api_key'}}                                                                                     
           Received Model Group=gpt-4o-2024-08-06                                                                  
           Available Model Group Fallbacks=None 
loilisxka commented 6 days ago

I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:

           Received Model Group=gpt-4o-2024-08-06                                   
           Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM   
           Max Retries: 3

Same problem here, trying to run paperqa locally for the last month and it is pulling gpt-4o-2024-xx-xx model instead of local LLM. I tryed running llama3.1, llama3.2 and I have no problem executing the example from ollama to add documents to storage and ask questions. Paper-qa seems to connect to ollama server, gpu has the corresponding process, but the error is the one above. Could someone who has a working example, provide their virtual environment requirement list and IDE description, whether it is a notebook, a plain py or whatever.

Could you please share your Settings configuration? And the corresponding ollama environment configuration required

grg-ffb commented 5 days ago

My running example is python 3.12.3, paperqa 5.0.10. @victorconka your code works in my setup except I commented out import nest_asyncio and nest_asyncio.apply() as I have this not yet available in my setup. But I experienced that, as no folder containing papers is specified, paperqa checks all available files (including .py files) and produces errors if file is not a paper in format .pdf. It also seems that it is the same in your case - at least the error-messages point in this direction.

victorconka commented 5 days ago

@grg-ffb Thank you so much for trying out my code and providing your setup. I have just set up the environment with python==3.12 (i got 3.12.7), paperqa==5.0.10 and I've finally been able to get this working. I have a papers folder, containing txt files (converted from pdf files), I did not specify anywhere in the code papers folder path, so it is picking up all the txt files but does not seem to read any .py, nevertheless you can specify "paper_directory="./papers" " within ask Settings attribute. I get document parsing errors on the first run, but it is not a problem and I still get an answer both in the console and the "answer" object. By the way, I am running a remote ubuntu 22.04 with NVidia 3090 with ollama installed as described in their wiki:

Ollama installation

curl -fsSL https://ollama.com/install.sh | sh  
ollama pull mxbai-embed-large  
ollama pull llama3.1  
ollama pull llama3.2  

Directory setup :

tree .
.
├── ollama-ask.ipynb
├── papers
│   ├── 1.- paper.txt
│   ├── 2.- paper.txt
│   ├── 3.- paper.txt
│   └── 4.- paper.txt
├── pq.ipynb
└── questions.py

pq.ipynb

# jupyter requires asyncio to execute paperqa
import nest_asyncio
nest_asyncio.apply()

from paperqa import Settings, ask

import os
# set up openai key to stop openai lib from complaining 
api_key="sk-no-key-required"
os.environ['OPENAI_API_KEY'] = api_key

# question list
question_list = [
    "test q1?",
    "test q2?",
    "test q3?",
]

from questions import *

model = "ollama/llama3.1" # "ollama/llama3.2" 
embedding = "ollama/mxbai-embed-large" # "nomic-embed-text"

local_llm_config = {
    "model_list": [
        {
            "model_name": model,
            "litellm_params": {
                "model": model,
                "api_base": "http://localhost:11434",
            },
        }
    ]
}

answer = ask(
    question_list[0],
    settings=Settings(
        llm=model,
        llm_config=local_llm_config,
        summary_llm=model,
        summary_llm_config=local_llm_config,
        embedding=embedding,
    ),
)

P.S. I decided to give a try to the latest paper-qa==5.2.1 version and it has the problem as mentioned in this issue.

Snikch63200 commented 4 days ago

Hello,

This issue is a redundant problem related to embedding model. Setting a local embedding model is problematic.

I have no solution for ask function but une Doc().query method instead. See her : https://github.com/Future-House/paper-qa/issues/541.

Best regards.

qingqiyangqidanqi commented 4 days ago

I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,

from paperqa import Settings, ask

local_llm_config = dict(
    model_list=[
        dict(
            model_name="Qwen2.5-14B-Instruct",
            litellm_params=dict(
                model="Qwen2.5-14B-Instruct",
                api_base="http://127.0.0.1:8000/v1",
                api_key="EMPTY",
                temperature=0.7,
                max_tokens=512,
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        llm="Qwen2.5-14B-Instruct",
        llm_config=local_llm_config,
        summary_llm="Qwen2.5-14B-Instruct",
        summary_llm_config=local_llm_config,
    ),
)

I got the following error:

Traceback (most recent call last):
  File "/home/huzheyu/project/paper/test/qa.py", line 22, in <module>
    answer = ask(
             ^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/__init__.py", line 102, in ask
    return get_loop().run_until_complete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 67, in agent_query
    response = await run_agent(docs, query, agent_type, **runner_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 119, in run_agent
    elif tool_selector_or_none := query.settings.make_aviary_tool_selector(agent_type):
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/settings.py", line 719, in make_aviary_tool_selector
    acompletion=self.get_agent_llm().router.acompletion,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/llms.py", line 610, in router
    self._router = litellm.Router(
                   ^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 352, in __init__
    self.set_model_list(model_list)
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4069, in set_model_list
    self._create_deployment(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 3994, in _create_deployment
    deployment = self._add_deployment(deployment=deployment)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4158, in _add_deployment
    set_client(
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router_utils/client_initalization_utils.py", line 441, in set_client
    _client = openai.AsyncOpenAI(  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/openai/_client.py", line 319, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.

Me too.

loilisxka commented 4 days ago

Hello,

This issue is a redundant problem related to embedding model. Setting a local embedding model is problematic.

I have no solution for ask function but une Doc().query method instead. See her : #541.

Best regards.

Thank you for your answer. I will try to add the Embedding model later. If this method works, I suggest you modify the example. Thank you again.

whitead commented 3 days ago

We've added a new feature to use the local sentence transformers library, which may be an easier way than trying to get litellm configured correctly for the using local embeddings:

https://github.com/Future-House/paper-qa?tab=readme-ov-file#local-embedding-models-sentence-transformers

afonsoguerra commented 1 day ago

@whitead This is awesome and the sentence transformers is indeed easier to setup for embeddings.

That said, in version 5.3.0 there are still plenty of errors as reported above for the main LLM model using either Ollama or llamafile as source as it still tries to connect to openai regardless.

Script:

from paperqa import Settings, ask

The easiest way to get set-up is to download a llama file and execute it with -cb -np 4 -a my-llm-model --embedding which will enable continuous batching and embeddings.

local_llm_config = dict( model_list=[ dict( model_name=f"openai/my-llm-model", litellm_params=dict( model=f"openai/my-llm-model", api_base="http://localhost:8080/v1", api_key="sk-no-key-required", temperature=0.1, frequency_penalty=1.5, max_tokens=1024, ), ) ] )

answer = ask( "What manufacturing challenges are unique to bispecific antibodies?", settings=Settings( llm="my-llm-model", llm_config=local_llm_config, summary_llm="my-llm-model", summary_llm_config=local_llm_config, temperature=0.5, paper_directory="my_papers" ), )

Error start:

[15:10:04] Agent <aviary.tools.utils.ToolSelector object at 0x10f5d1010> failed.

Error end:

       AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided:
       sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
       Received Model Group=gpt-4o-2024-08-06
       Available Model Group Fallbacks=None

[15:10:05] Answer:

prof-robert commented 1 day ago

Hi, I have the same problem. As far as I can tell, the

agent: AgentSettings = AgentSettings

in the Settings Object is still set to OpenAI. I can not change these settings, (see settings.py):

class AgentSettings(BaseModel):
    model_config = ConfigDict(extra="forbid")

    agent_llm: str = Field(
        default="gpt-4o-2024-08-06",
        description="Model to use for agent",
    )

Could this be the cause of the issue? Kind Regards, Robert

jamesbraza commented 1 day ago

source as it still tries to connect to openai regardless

@afonsoguerra and @prof-robert, if you want to use a local model for everything, you'll need to configure all places with models: settings.llm, settings.summary_llm, settings.agent.agent_llm, settings.embedding

If you configure all of these places, and still see API calls going to OpenAI, let us know 👌

bwnjnOEI commented 1 day ago

Code

from paperqa import Settings, ask
import os

os.environ['OPENAI_API_KEY'] = "ollama"

local_llm_config = dict(
    model_list=[
        dict(
            model_name='ollama/llama3.2',
            litellm_params=dict(
                model='ollama/llama3.2',
                api_base="http://100.*8*.**8.1**:11434",
            ),
        )
    ]
)

answer = ask(
    "What is RAG?",
    settings=Settings(
        llm='ollama/llama3.2',
        llm_config=local_llm_config,
        summary_llm='ollama/llama3.2',
        summary_llm_config=local_llm_config,
        embedding='ollama/mxbai-embed-large',
        paper_directory="my_papers"
    ),
)

Run result

➜ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py"

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

[03:43:46] Agent <aviary.tools.utils.ToolSelector object at 0x740f48d363c0> failed.                                                                                          
           ╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:867 in acompletion                                          │
           │                                                                                                                                                                │
           │    864 │   │   │   │   │   },                                                                                                                                  │
           │    865 │   │   │   │   )                                                                                                                                       │
           │    866 │   │   │   │                                                                                                                                           │
           │ ❱  867 │   │   │   │   headers, response = await self.make_openai_chat_completion_request(                                                                     │
           │    868 │   │   │   │   │   openai_aclient=openai_aclient, data=data, timeout=timeout                                                                           │
           │    869 │   │   │   │   )                                                                                                                                       │
           │    870 │   │   │   │   stringified_response = response.model_dump()                                                                                            │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:580 in make_openai_chat_completion_request                  │
           │                                                                                                                                                                │
           │    577 │   │   │   response = raw_response.parse()                                                                                                             │
           │    578 │   │   │   return headers, response                                                                                                                    │
           │    579 │   │   except Exception as e:                                                                                                                          │
           │ ❱  580 │   │   │   raise e                                                                                                                                     │
           │    581 │                                                                                                                                                       │
           │    582 │   def make_sync_openai_chat_completion_request(                                                                                                       │
           │    583 │   │   self,                                                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:568 in make_openai_chat_completion_request                  │
           │                                                                                                                                                                │
           │    565 │   │   """                                                                                                                                             │
           │    566 │   │   try:                                                                                                                                            │
           │    567 │   │   │   raw_response = (                                                                                                                            │
           │ ❱  568 │   │   │   │   await openai_aclient.chat.completions.with_raw_response.create(                                                                         │
           │    569 │   │   │   │   │   **data, timeout=timeout                                                                                                             │
           │    570 │   │   │   │   )                                                                                                                                       │
           │    571 │   │   │   )                                                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_legacy_response.py:370 in wrapped                                                 │
           │                                                                                                                                                                │
           │   367 │   │                                                                                                                                                    │
           │   368 │   │   kwargs["extra_headers"] = extra_headers                                                                                                          │
           │   369 │   │                                                                                                                                                    │
           │ ❱ 370 │   │   return cast(LegacyAPIResponse[R], await func(*args, **kwargs))                                                                                   │
           │   371 │                                                                                                                                                        │
           │   372 │   return wrapped                                                                                                                                       │
           │   373                                                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/resources/chat/completions.py:1412 in create                                       │
           │                                                                                                                                                                │
           │   1409 │   │   timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,                                                                                   │
           │   1410 │   ) -> ChatCompletion | AsyncStream[ChatCompletionChunk]:                                                                                             │
           │   1411 │   │   validate_response_format(response_format)                                                                                                       │
           │ ❱ 1412 │   │   return await self._post(                                                                                                                        │
           │   1413 │   │   │   "/chat/completions",                                                                                                                        │
           │   1414 │   │   │   body=await async_maybe_transform(                                                                                                           │
           │   1415 │   │   │   │   {                                                                                                                                       │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1821 in post                                                       │
           │                                                                                                                                                                │
           │   1818 │   │   opts = FinalRequestOptions.construct(                                                                                                           │
           │   1819 │   │   │   method="post", url=path, json_data=body, files=await                                                                                        │
           │        async_to_httpx_files(files), **options                                                                                                                  │
           │   1820 │   │   )                                                                                                                                               │
           │ ❱ 1821 │   │   return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)                                                                  │
           │   1822 │                                                                                                                                                       │
           │   1823 │   async def patch(                                                                                                                                    │
           │   1824 │   │   self,                                                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1515 in request                                                    │
           │                                                                                                                                                                │
           │   1512 │   │   stream_cls: type[_AsyncStreamT] | None = None,                                                                                                  │
           │   1513 │   │   remaining_retries: Optional[int] = None,                                                                                                        │
           │   1514 │   ) -> ResponseT | _AsyncStreamT:                                                                                                                     │
           │ ❱ 1515 │   │   return await self._request(                                                                                                                     │
           │   1516 │   │   │   cast_to=cast_to,                                                                                                                            │
           │   1517 │   │   │   options=options,                                                                                                                            │
           │   1518 │   │   │   stream=stream,                                                                                                                              │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1616 in _request                                                   │
           │                                                                                                                                                                │
           │   1613 │   │   │   │   await err.response.aread()                                                                                                              │
           │   1614 │   │   │                                                                                                                                               │
           │   1615 │   │   │   log.debug("Re-raising status error")                                                                                                        │
           │ ❱ 1616 │   │   │   raise self._make_status_error_from_response(err.response) from None                                                                         │
           │   1617 │   │                                                                                                                                                   │
           │   1618 │   │   return await self._process_response(                                                                                                            │
           │   1619 │   │   │   cast_to=cast_to,                                                                                                                            │
           ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at                                    
           https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}                                       

           During handling of the above exception, another exception occurred:                                                                                               

           ╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/main.py:435 in acompletion                                                        │
           │                                                                                                                                                                │
           │    432 │   │   │   │   │   response = ModelResponse(**init_response)                                                                                           │
           │    433 │   │   │   │   response = init_response                                                                                                                │
           │    434 │   │   │   elif asyncio.iscoroutine(init_response):                                                                                                    │
           │ ❱  435 │   │   │   │   response = await init_response                                                                                                          │
           │    436 │   │   │   else:                                                                                                                                       │
           │    437 │   │   │   │   response = init_response  # type: ignore                                                                                                │
           │    438                                                                                                                                                         │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:898 in acompletion                                          │
           │                                                                                                                                                                │
           │    895 │   │   │   │   if error_headers is None and exception_response:                                                                                        │
           │    896 │   │   │   │   │   error_headers = getattr(exception_response, "headers", None)                                                                        │
           │    897 │   │   │   │                                                                                                                                           │
           │ ❱  898 │   │   │   │   raise OpenAIError(                                                                                                                      │
           │    899 │   │   │   │   │   status_code=status_code, message=str(e), headers=error_headers                                                                      │
           │    900 │   │   │   │   )                                                                                                                                       │
           │    901                                                                                                                                                         │
           ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           OpenAIError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at                                            
           https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}                                       

           During handling of the above exception, another exception occurred:                                                                                               

           ╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/paperqa/agents/main.py:242 in run_aviary_agent                                            │
           │                                                                                                                                                                │
           │   239 │   │   │   │   │   )                                                                                                                                    │
           │   240 │   │   │   │   │   return env.state.answer, AgentStatus.TRUNCATED                                                                                       │
           │   241 │   │   │   │   agent_state.messages += obs                                                                                                              │
           │ ❱ 242 │   │   │   │   for attempt in Retrying(                                                                                                                 │
           │   243 │   │   │   │   │   stop=stop_after_attempt(5),                                                                                                          │
           │   244 │   │   │   │   │   retry=retry_if_exception_type(MalformedMessageError),                                                                                │
           │   245 │   │   │   │   │   before_sleep=before_sleep_log(logger, logging.WARNING),                                                                              │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:443 in __iter__                                                      │
           │                                                                                                                                                                │
           │   440 │   │                                                                                                                                                    │
           │   441 │   │   retry_state = RetryCallState(self, fn=None, args=(), kwargs={})                                                                                  │
           │   442 │   │   while True:                                                                                                                                      │
           │ ❱ 443 │   │   │   do = self.iter(retry_state=retry_state)                                                                                                      │
           │   444 │   │   │   if isinstance(do, DoAttempt):                                                                                                                │
           │   445 │   │   │   │   yield AttemptManager(retry_state=retry_state)                                                                                            │
           │   446 │   │   │   elif isinstance(do, DoSleep):                                                                                                                │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:376 in iter                                                          │
           │                                                                                                                                                                │
           │   373 │   │   self._begin_iter(retry_state)                                                                                                                    │
           │   374 │   │   result = None                                                                                                                                    │
           │   375 │   │   for action in self.iter_state.actions:                                                                                                           │
           │ ❱ 376 │   │   │   result = action(retry_state)                                                                                                                 │
           │   377 │   │   return result                                                                                                                                    │
           │   378 │                                                                                                                                                        │
           │   379 │   def _begin_iter(self, retry_state: "RetryCallState") -> None:  # noqa                                                                                │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:398 in <lambda>                                                      │
           │                                                                                                                                                                │
           │   395 │                                                                                                                                                        │
           │   396 │   def _post_retry_check_actions(self, retry_state: "RetryCallState") -> None:                                                                          │
           │   397 │   │   if not (self.iter_state.is_explicit_retry or self.iter_state.retry_run_result):                                                                  │
           │ ❱ 398 │   │   │   self._add_action_func(lambda rs: rs.outcome.result())                                                                                        │
           │   399 │   │   │   return                                                                                                                                       │
           │   400 │   │                                                                                                                                                    │
           │   401 │   │   if self.after is not None:                                                                                                                       │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/concurrent/futures/_base.py:449 in result                                                               │
           │                                                                                                                                                                │
           │   446 │   │   │   │   if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:                                                                                   │
           │   447 │   │   │   │   │   raise CancelledError()                                                                                                               │
           │   448 │   │   │   │   elif self._state == FINISHED:                                                                                                            │
           │ ❱ 449 │   │   │   │   │   return self.__get_result()                                                                                                           │
           │   450 │   │   │   │                                                                                                                                            │
           │   451 │   │   │   │   self._condition.wait(timeout)                                                                                                            │
           │   452                                                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/concurrent/futures/_base.py:401 in __get_result                                                         │
           │                                                                                                                                                                │
           │   398 │   def __get_result(self):                                                                                                                              │
           │   399 │   │   if self._exception:                                                                                                                              │
           │   400 │   │   │   try:                                                                                                                                         │
           │ ❱ 401 │   │   │   │   raise self._exception                                                                                                                    │
           │   402 │   │   │   finally:                                                                                                                                     │
           │   403 │   │   │   │   # Break a reference cycle with the exception in self._exception                                                                          │
           │   404 │   │   │   │   self = None                                                                                                                              │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/paperqa/agents/main.py:249 in run_aviary_agent                                            │
           │                                                                                                                                                                │
           │   246 │   │   │   │   │   reraise=True,                                                                                                                        │
           │   247 │   │   │   │   ):                                                                                                                                       │
           │   248 │   │   │   │   │   with attempt:  # Retrying if ToolSelector fails to select a tool                                                                     │
           │ ❱ 249 │   │   │   │   │   │   action = await agent(agent_state.messages, tools)                                                                                │
           │   250 │   │   │   │   agent_state.messages = [*agent_state.messages, action]                                                                                   │
           │   251 │   │   │   │   if on_agent_action_callback:                                                                                                             │
           │   252 │   │   │   │   │   await on_agent_action_callback(action, agent_state)                                                                                  │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/aviary/tools/utils.py:175 in __call__                                                     │
           │                                                                                                                                                                │
           │   172 │   │   │   │   # in practice 'tool_calls' shows up too                                                                                                  │
           │   173 │   │   │   │   expected_finish_reason.add("stop")                                                                                                       │
           │   174 │   │                                                                                                                                                    │
           │ ❱ 175 │   │   model_response = await self._bound_acompletion(                                                                                                  │
           │   176 │   │   │   messages=MessagesAdapter.dump_python(                                                                                                        │
           │   177 │   │   │   │   messages, exclude_none=True, by_alias=True                                                                                               │
           │   178 │   │   │   ),                                                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:740 in acompletion                                                      │
           │                                                                                                                                                                │
           │    737 │   │   │   │   │   original_exception=e,                                                                                                               │
           │    738 │   │   │   │   )                                                                                                                                       │
           │    739 │   │   │   )                                                                                                                                           │
           │ ❱  740 │   │   │   raise e                                                                                                                                     │
           │    741 │                                                                                                                                                       │
           │    742 │   async def _acompletion(                                                                                                                             │
           │    743 │   │   self, model: str, messages: List[Dict[str, str]], **kwargs                                                                                      │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:728 in acompletion                                                      │
           │                                                                                                                                                                │
           │    725 │   │   │   if request_priority is not None and isinstance(request_priority, int):                                                                      │
           │    726 │   │   │   │   response = await self.schedule_acompletion(**kwargs)                                                                                    │
           │    727 │   │   │   else:                                                                                                                                       │
           │ ❱  728 │   │   │   │   response = await self.async_function_with_fallbacks(**kwargs)                                                                           │
           │    729 │   │   │                                                                                                                                               │
           │    730 │   │   │   return response                                                                                                                             │
           │    731 │   │   except Exception as e:                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3044 in async_function_with_fallbacks                                   │
           │                                                                                                                                                                │
           │   3041 │   │   │   │   │   │   )                                                                                                                               │
           │   3042 │   │   │   │   │   )                                                                                                                                   │
           │   3043 │   │   │                                                                                                                                               │
           │ ❱ 3044 │   │   │   raise original_exception                                                                                                                    │
           │   3045 │                                                                                                                                                       │
           │   3046 │   async def async_function_with_retries(self, *args, **kwargs):                                                                                       │
           │   3047 │   │   verbose_router_logger.debug(                                                                                                                    │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:2898 in async_function_with_fallbacks                                   │
           │                                                                                                                                                                │
           │   2895 │   │   │   │   │   │   Context_Policy_Fallbacks={content_policy_fallbacks}",                                                                           │
           │   2896 │   │   │   │   )                                                                                                                                       │
           │   2897 │   │   │                                                                                                                                               │
           │ ❱ 2898 │   │   │   response = await self.async_function_with_retries(*args, **kwargs)                                                                          │
           │   2899 │   │   │   verbose_router_logger.debug(f"Async Response: {response}")                                                                                  │
           │   2900 │   │   │   return response                                                                                                                             │
           │   2901 │   │   except Exception as e:                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3105 in async_function_with_retries                                     │
           │                                                                                                                                                                │
           │   3102 │   │   │   )                                                                                                                                           │
           │   3103 │   │   │                                                                                                                                               │
           │   3104 │   │   │   # raises an exception if this error should not be retries                                                                                   │
           │ ❱ 3105 │   │   │   self.should_retry_this_error(                                                                                                               │
           │   3106 │   │   │   │   error=e,                                                                                                                                │
           │   3107 │   │   │   │   healthy_deployments=_healthy_deployments,                                                                                               │
           │   3108 │   │   │   │   all_deployments=_all_deployments,                                                                                                       │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3243 in should_retry_this_error                                         │
           │                                                                                                                                                                │
           │   3240 │   │   │   if (                                                                                                                                        │
           │   3241 │   │   │   │   _num_all_deployments <= 1                                                                                                               │
           │   3242 │   │   │   ):  # if there is only 1 deployment for this model group then don't retry                                                                   │
           │ ❱ 3243 │   │   │   │   raise error  # then raise error                                                                                                         │
           │   3244 │   │                                                                                                                                                   │
           │   3245 │   │   # Do not retry if there are no healthy deployments                                                                                              │
           │   3246 │   │   # just raise the error                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3088 in async_function_with_retries                                     │
           │                                                                                                                                                                │
           │   3085 │   │   │   │   │   message=f"This is a mock exception for model={model_group}, to                                                                      │
           │        trigger a rate limit error.",                                                                                                                           │
           │   3086 │   │   │   │   )                                                                                                                                       │
           │   3087 │   │   │   # if the function call is successful, no exception will be raised and we'll                                                                 │
           │        break out of the loop                                                                                                                                   │
           │ ❱ 3088 │   │   │   response = await self.make_call(original_function, *args, **kwargs)                                                                         │
           │   3089 │   │   │                                                                                                                                               │
           │   3090 │   │   │   return response                                                                                                                             │
           │   3091 │   │   except Exception as e:                                                                                                                          │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3181 in make_call                                                       │
           │                                                                                                                                                                │
           │   3178 │   │   Handler for making a call to the .completion()/.embeddings() functions.                                                                         │
           │   3179 │   │   """                                                                                                                                             │
           │   3180 │   │   model_group = kwargs.get("model")                                                                                                               │
           │ ❱ 3181 │   │   response = await original_function(*args, **kwargs)                                                                                             │
           │   3182 │   │   ## PROCESS RESPONSE HEADERS                                                                                                                     │
           │   3183 │   │   await self.set_response_headers(response=response, model_group=model_group)                                                                     │
           │   3184                                                                                                                                                         │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:879 in _acompletion                                                     │
           │                                                                                                                                                                │
           │    876 │   │   │   )                                                                                                                                           │
           │    877 │   │   │   if model_name is not None:                                                                                                                  │
           │    878 │   │   │   │   self.fail_calls[model_name] += 1                                                                                                        │
           │ ❱  879 │   │   │   raise e                                                                                                                                     │
           │    880 │                                                                                                                                                       │
           │    881 │   async def abatch_completion(                                                                                                                        │
           │    882 │   │   self,                                                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:851 in _acompletion                                                     │
           │                                                                                                                                                                │
           │    848 │   │   │   │   await self.async_routing_strategy_pre_call_checks(                                                                                      │
           │    849 │   │   │   │   │   deployment=deployment, logging_obj=logging_obj                                                                                      │
           │    850 │   │   │   │   )                                                                                                                                       │
           │ ❱  851 │   │   │   │   response = await _response                                                                                                              │
           │    852 │   │   │                                                                                                                                               │
           │    853 │   │   │   ## CHECK CONTENT FILTER ERROR ##                                                                                                            │
           │    854 │   │   │   if isinstance(response, ModelResponse):                                                                                                     │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/utils.py:1573 in wrapper_async                                                    │
           │                                                                                                                                                                │
           │   1570 │   │   │   │   │   else:                                                                                                                               │
           │   1571 │   │   │   │   │   │   kwargs["model"] = context_window_fallback_dict[model]                                                                           │
           │   1572 │   │   │   │   │   return await original_function(*args, **kwargs)                                                                                     │
           │ ❱ 1573 │   │   │   raise e                                                                                                                                     │
           │   1574 │                                                                                                                                                       │
           │   1575 │   is_coroutine = inspect.iscoroutinefunction(original_function)                                                                                       │
           │   1576                                                                                                                                                         │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/utils.py:1388 in wrapper_async                                                    │
           │                                                                                                                                                                │
           │   1385 │   │   │   │   │   │   │   ).start()                                                                                                                   │
           │   1386 │   │   │   │   │   │   │   return final_embedding_cached_response                                                                                      │
           │   1387 │   │   │   # MODEL CALL                                                                                                                                │
           │ ❱ 1388 │   │   │   result = await original_function(*args, **kwargs)                                                                                           │
           │   1389 │   │   │   end_time = datetime.datetime.now()                                                                                                          │
           │   1390 │   │   │   if "stream" in kwargs and kwargs["stream"] is True:                                                                                         │
           │   1391 │   │   │   │   if (                                                                                                                                    │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/main.py:457 in acompletion                                                        │
           │                                                                                                                                                                │
           │    454 │   │   return response                                                                                                                                 │
           │    455 │   except Exception as e:                                                                                                                              │
           │    456 │   │   custom_llm_provider = custom_llm_provider or "openai"                                                                                           │
           │ ❱  457 │   │   raise exception_type(                                                                                                                           │
           │    458 │   │   │   model=model,                                                                                                                                │
           │    459 │   │   │   custom_llm_provider=custom_llm_provider,                                                                                                    │
           │    460 │   │   │   original_exception=e,                                                                                                                       │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:2033 in exception_type              │
           │                                                                                                                                                                │
           │   2030 │   │   # don't let an error with mapping interrupt the user from receiving an error                                                                    │
           │        from the llm api calls                                                                                                                                  │
           │   2031 │   │   if exception_mapping_worked:                                                                                                                    │
           │   2032 │   │   │   setattr(e, "litellm_response_headers", litellm_response_headers)                                                                            │
           │ ❱ 2033 │   │   │   raise e                                                                                                                                     │
           │   2034 │   │   else:                                                                                                                                           │
           │   2035 │   │   │   for error_type in litellm.LITELLM_EXCEPTION_TYPES:                                                                                          │
           │   2036 │   │   │   │   if isinstance(e, error_type):                                                                                                           │
           │                                                                                                                                                                │
           │ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:358 in exception_type               │
           │                                                                                                                                                                │
           │    355 │   │   │   │   │   │   )                                                                                                                               │
           │    356 │   │   │   │   │   elif original_exception.status_code == 401:                                                                                         │
           │    357 │   │   │   │   │   │   exception_mapping_worked = True                                                                                                 │
           │ ❱  358 │   │   │   │   │   │   raise AuthenticationError(                                                                                                      │
           │    359 │   │   │   │   │   │   │   message=f"AuthenticationError: {exception_provider} -                                                                       │
           │        {message}",                                                                                                                                             │
           │    360 │   │   │   │   │   │   │   llm_provider=custom_llm_provider,                                                                                           │
           │    361 │   │   │   │   │   │   │   model=model,                                                                                                                │
           ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided:     
           ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}  
           Received Model Group=gpt-4o-2024-08-06                                                                                                                            
           Available Model Group Fallbacks=None                                                                                                                              
[03:43:47] Answer:                                                                        
jamesbraza commented 1 day ago

Hi @bwnjnOEI you are quite close, what you need to do is also configure settings.agent.agent_llm. This is the LLM powering the agent that selects tools.

from paperqa import Settings, ask, AgentSettings

answer = ask(
    ...,
    settings=Settings(
        ...,
        agent=AgentSettings(
            agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config
        )
    ),
)
bwnjnOEI commented 1 day ago

Hi @bwnjnOEI you are quite close, what you need to do is also configure settings.agent.agent_llm. This is the LLM powering the agent that selects tools.

from paperqa import Settings, ask, AgentSettings

answer = ask(
    ...,
    settings=Settings(
        ...,
        agent=AgentSettings(
            agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config
        )
    ),
)

I'm very grateful for your guidance, I modified the code according to your hint, and it worked.

Update code

from paperqa import Settings, ask
from paperqa.settings import AgentSettings
import os

os.environ['OPENAI_API_KEY'] = "ollama"

local_llm_config = dict(
    model_list=[
        dict(
            model_name='ollama/llama3.2',
            litellm_params=dict(
                model='ollama/llama3.2',
                api_base="http://100.*8*.**8.1**:11434",
            ),
        )
    ]
)

answer = ask(
    "What is RAG?",
    settings=Settings(
        llm='ollama/llama3.2',
        llm_config=local_llm_config,
        summary_llm='ollama/llama3.2',
        summary_llm_config=local_llm_config,
        embedding='ollama/mxbai-embed-large',
        agent=AgentSettings(
            agent_llm='ollama/llama3.2', 
            agent_llm_config=local_llm_config
        ),
        paper_directory="my_papers"
    ),
)

Run result

❯ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py"
[04:09:47] Could not find cost for model ollama/llama3.2.                                                                                                          
           Starting paper search for 'RAG'.                                                                                                                        
           paper_search for query 'RAG' and offset 0 returned 1 papers.                                                                                            
           Status: Paper Count=1 | Relevant Papers=0 | Current Evidence=0 | Current Cost=$0.0000                                                                   
[04:09:48] Could not find cost for model ollama/llama3.2.                                                                                                          
           gather_evidence starting for question 'What is RAG?'.                                                                                                   
[04:09:53] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000                                                                   
[04:09:54] Generating answer for 'What is RAG?'.                                                                                                                   
[04:09:56] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000                                                                   
           Answer: RAG stands for Retrieval-Augmented Text Generation, a research area that improves the quality of language models by grounding their generation  
           process on a set of retrieved materials such as relevant documents. This approach has gained increasing interest recently and aims to combine retrieval 
           and generation in a more integrated way.                                                                                                                

           The term RAG is also used interchangeably with Retrieval-Augmented Generation (RAG), which involves using a retrieval model to retrieve relevant        
           documents or snippets from a large corpus, and then using a generation model to generate text based on these retrieved snippets. This technique is used 
           in natural language processing (NLP) tasks to improve performance.                                                                                      

           In the context of phrase segmentation, RAG stands for Phrase Segmentation Algorithm (RAG), which uses a forward maximum matching strategy to identify   
           phrases. The algorithm can be optimized using pre-trained DPR models and FAISS vector search toolkits, as mentioned in "COPY IS ALL YOU NEED." for      
           efficient phrase segmentation in large document sets.                                                                                                   

           The COG model, proposed in the same paper, is a text generation model that outperforms strong Transformer baselines in terms of generation quality. It  
           can be fine-tuned for different domains without requiring additional training. The model's performance is evaluated on various tasks, including domain  
           adaptation and enlarged phrase index settings.                                                                                                          

           Sources:                                                                                                                                                
           - Lan2023 pages 8-9: RAG stands for Retrieval-Augmented Text Generation.                                                                                
           - Lan2023 pages 11-12: RAG also refers to Retrieval-Augmented Generation (RAG).                                                                         
           - Lan2023 pages 13-15: RAG stands for Phrase Segmentation Algorithm (RAG) in the context of phrase segmentation.   
loilisxka commented 1 day ago

嗨,你离得很近,你需要做的也是配置 。这是为选择工具的代理提供支持的 LLM。settings.agent.agent_llm

from paperqa import Settings, ask, AgentSettings

answer = ask(
    ...,
    settings=Settings(
        ...,
        agent=AgentSettings(
            agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config
        )
    ),
)

非常感谢您的指导,我根据您的提示修改了代码,并且它有效。

更新代码

from paperqa import Settings, ask
from paperqa.settings import AgentSettings
import os

os.environ['OPENAI_API_KEY'] = "ollama"

local_llm_config = dict(
    model_list=[
        dict(
            model_name='ollama/llama3.2',
            litellm_params=dict(
                model='ollama/llama3.2',
                api_base="http://100.*8*.**8.1**:11434",
            ),
        )
    ]
)

answer = ask(
    "What is RAG?",
    settings=Settings(
        llm='ollama/llama3.2',
        llm_config=local_llm_config,
        summary_llm='ollama/llama3.2',
        summary_llm_config=local_llm_config,
        embedding='ollama/mxbai-embed-large',
        agent=AgentSettings(
            agent_llm='ollama/llama3.2', 
            agent_llm_config=local_llm_config
        ),
        paper_directory="my_papers"
    ),
)

运行结果

❯ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py"
[04:09:47] Could not find cost for model ollama/llama3.2.                                                                                                          
           Starting paper search for 'RAG'.                                                                                                                        
           paper_search for query 'RAG' and offset 0 returned 1 papers.                                                                                            
           Status: Paper Count=1 | Relevant Papers=0 | Current Evidence=0 | Current Cost=$0.0000                                                                   
[04:09:48] Could not find cost for model ollama/llama3.2.                                                                                                          
           gather_evidence starting for question 'What is RAG?'.                                                                                                   
[04:09:53] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000                                                                   
[04:09:54] Generating answer for 'What is RAG?'.                                                                                                                   
[04:09:56] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000                                                                   
           Answer: RAG stands for Retrieval-Augmented Text Generation, a research area that improves the quality of language models by grounding their generation  
           process on a set of retrieved materials such as relevant documents. This approach has gained increasing interest recently and aims to combine retrieval 
           and generation in a more integrated way.                                                                                                                

           The term RAG is also used interchangeably with Retrieval-Augmented Generation (RAG), which involves using a retrieval model to retrieve relevant        
           documents or snippets from a large corpus, and then using a generation model to generate text based on these retrieved snippets. This technique is used 
           in natural language processing (NLP) tasks to improve performance.                                                                                      

           In the context of phrase segmentation, RAG stands for Phrase Segmentation Algorithm (RAG), which uses a forward maximum matching strategy to identify   
           phrases. The algorithm can be optimized using pre-trained DPR models and FAISS vector search toolkits, as mentioned in "COPY IS ALL YOU NEED." for      
           efficient phrase segmentation in large document sets.                                                                                                   

           The COG model, proposed in the same paper, is a text generation model that outperforms strong Transformer baselines in terms of generation quality. It  
           can be fine-tuned for different domains without requiring additional training. The model's performance is evaluated on various tasks, including domain  
           adaptation and enlarged phrase index settings.                                                                                                          

           Sources:                                                                                                                                                
           - Lan2023 pages 8-9: RAG stands for Retrieval-Augmented Text Generation.                                                                                
           - Lan2023 pages 11-12: RAG also refers to Retrieval-Augmented Generation (RAG).                                                                         
           - Lan2023 pages 13-15: RAG stands for Phrase Segmentation Algorithm (RAG) in the context of phrase segmentation.   

Thank you very much for sharing, this is useful for me.

prof-robert commented 1 day ago

Yes, thanks, it works for me now too! btw. If you use the command-line version, this was the way I got it to workt with ollama:

pqa --llm "ollama/llama3.2" \
    --summary_llm "ollama/llama3.2" \
    --agent.agent_llm "ollama/llama3.2" \
    --embedding "ollama/mxbai-embed-large" \
    --llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
    --summary_llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
    --agent.agent_llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
    --prompts.use_json False ask "Your question here"
afonsoguerra commented 1 day ago

Thanks @jamesbraza for the prompt reply, I thought it was something along those lines (because I could see my local llm doing something for both embedding and generation), but I couldn't find the details in the documentation of what I was missing! It's working now! Thanks also to @prof-robert for the command line version. I meant to investigate that at some point and this just saves me time :)