Open loilisxka opened 1 week ago
Hey, @loilisxka! I'm here to help you with any coding bugs or questions you have. Let's work together to solve this issue!
It looks like the issue is related to the OPENAI_API_KEY
environment variable. Even when using a local LLM, the system might still require this variable to be set due to dependencies on libraries like langchain
or litellm
.
A similar issue was discussed in a closed thread where the solution was to set the OPENAI_API_KEY
to a generic value. You can try setting it like this:
export OPENAI_API_KEY=foo
This should help bypass the requirement for an actual OpenAI API key and allow your local LLM configuration to be used [1].
To continue talking to Dosu, mention @dosu.
Hey, @dosu I skipped the api check by adding the environment variable os.environ['OPENAI_API_KEY'] = 'foo'
, but then the program started repeatedly printing
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
What does this mean? Here is the full error
Timeout: litellm.Timeout: APITimeoutError - Request timed out.
error_str: Request timed out.
Received Model Group=gpt-4o-2024-08-06
Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
It seems that the program is still looking for online llm instead of local.
The error message indicates that litellm
is timing out while trying to connect to the specified model. This suggests that the configuration might still be pointing to an online model instead of your local setup.
Here are a few things to check:
API Base URL: Ensure that the api_base
in your litellm_params
is correctly set to your local server's address (http://127.0.0.1:8000/v1
).
Model Name: Double-check that the model_name
in your model_list
matches exactly with what your local server expects.
Network Issues: Verify that your local server is running and accessible at the specified api_base
URL. You can test this by sending a simple request to the server to see if it responds.
Verbose Logging: Enable verbose logging by setting litellm.set_verbose = True
to get more detailed information about what might be going wrong.
If everything seems correct and the issue persists, it might be worth checking the server logs for any errors or misconfigurations.
To continue talking to Dosu, mention @dosu.
I am pretty sure that there is nothing wrong with the locally deployed LLM. It is providing correct and timely answers when I access it using the Openai package, but paper-QA is not working properly.
I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:
Received Model Group=gpt-4o-2024-08-06
Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM
Max Retries: 3
I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:
Received Model Group=gpt-4o-2024-08-06 Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
Same problem here, trying to run paperqa locally for the last month and it is pulling gpt-4o-2024-xx-xx model instead of local LLM. I tryed running llama3.1, llama3.2 and I have no problem executing the example from ollama to add documents to storage and ask questions. Paper-qa seems to connect to ollama server, gpu has the corresponding process, but the error is the one above. Could someone who has a working example, provide their virtual environment requirement list and IDE description, whether it is a notebook, a plain py or whatever.
@grg-ffb - maybe you can share your setup? I saw that you edited the README file to add ollama in this fork: https://github.com/grg-ffb/paper-qa-ollama
@grg-ffb - maybe you can share your setup? I saw that you edited the README file to add ollama in this fork: https://github.com/grg-ffb/paper-qa-ollama
Hi @chemical-accuracy, my code is already merged to paperqa with PR #536 last week. But to add on it: I am hosting llama3.2 and mxbai-embed-large on ollama without any changes to default port or anything. For me also not every api did work. Sometimes I still get the error that openai keys are missing or rate limits are reached. So it seems that for some tasks paperqa still tries to use openai. But as a starting point it should work.
I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,
from paperqa import Settings, ask local_llm_config = dict( model_list=[ dict( model_name="Qwen2.5-14B-Instruct", litellm_params=dict( model="Qwen2.5-14B-Instruct", api_base="http://127.0.0.1:8000/v1", api_key="EMPTY", temperature=0.7, max_tokens=512, ), ) ] ) answer = ask( "What manufacturing challenges are unique to bispecific antibodies?", settings=Settings( llm="Qwen2.5-14B-Instruct", llm_config=local_llm_config, summary_llm="Qwen2.5-14B-Instruct", summary_llm_config=local_llm_config, ), )
I got the following error:
Traceback (most recent call last): File "/home/huzheyu/project/paper/test/qa.py", line 22, in <module> answer = ask( ^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/__init__.py", line 102, in ask return get_loop().run_until_complete( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 67, in agent_query response = await run_agent(docs, query, agent_type, **runner_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 119, in run_agent elif tool_selector_or_none := query.settings.make_aviary_tool_selector(agent_type): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/settings.py", line 719, in make_aviary_tool_selector acompletion=self.get_agent_llm().router.acompletion, ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/llms.py", line 610, in router self._router = litellm.Router( ^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 352, in __init__ self.set_model_list(model_list) File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4069, in set_model_list self._create_deployment( File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 3994, in _create_deployment deployment = self._add_deployment(deployment=deployment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4158, in _add_deployment set_client( File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router_utils/client_initalization_utils.py", line 441, in set_client _client = openai.AsyncOpenAI( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/openai/_client.py", line 319, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.
From my experience you also need to add the embedding model. If you don't, paperqa will still try to access openai for creating the embeddings and you will get this error if the openai api can't be reached.
So I did a clean install, conda environment with python 3.11 and pip install paper-qa==5.2.0, just to see if it is any different from conda isntall Execution in vscode jupyter:
import nest_asyncio
nest_asyncio.apply()
from paperqa import Settings, ask
import os
# set up openai key to stop openai lib from complaining
api_key="sk-no-key-required"
os.environ['OPENAI_API_KEY'] = api_key
# question list
question_list = [
"test q1?",
"test q2?",
"test q3?",
]
model = "ollama/llama3.2" # "ollama/llama3.1"
embedding = "ollama/mxbai-embed-large" # "nomic-embed-text"
local_llm_config = {
"model_list": [
{
"model_name": model,
"litellm_params": {
"model": model,
"api_base": "http://localhost:11434",
},
}
]
}
answer = ask(
question_list[0],
settings=Settings(
llm=model,
llm_config=local_llm_config,
summary_llm=model,
summary_llm_config=local_llm_config,
embedding=embedding,
),
)
Error that I get is:
╭───────────────────────────────── Traceback (most recent call last) ──────────────────────────────────╮
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/paperqa/agents/main.py:240 in │
│ run_aviary_agent │
│ │
│ 237 │ │ │ │ │ ) │
│ 238 │ │ │ │ │ return env.state.answer, AgentStatus.TRUNCATED │
│ 239 │ │ │ │ agent_state.messages += obs │
│ ❱ 240 │ │ │ │ for attempt in Retrying( │
│ 241 │ │ │ │ │ stop=stop_after_attempt(5), │
│ 242 │ │ │ │ │ retry=retry_if_exception_type(MalformedMessageError), │
│ 243 │ │ │ │ │ before_sleep=before_sleep_log(logger, logging.WARNING), │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:443 in __iter__ │
│ │
│ 440 │ │ │
│ 441 │ │ retry_state = RetryCallState(self, fn=None, args=(), kwargs={}) │
│ 442 │ │ while True: │
│ ❱ 443 │ │ │ do = self.iter(retry_state=retry_state) │
│ 444 │ │ │ if isinstance(do, DoAttempt): │
│ 445 │ │ │ │ yield AttemptManager(retry_state=retry_state) │
│ 446 │ │ │ elif isinstance(do, DoSleep): │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:376 in iter │
│ │
│ 373 │ │ self._begin_iter(retry_state) │
│ 374 │ │ result = None │
│ 375 │ │ for action in self.iter_state.actions: │
│ ❱ 376 │ │ │ result = action(retry_state) │
│ 377 │ │ return result │
│ 378 │ │
│ 379 │ def _begin_iter(self, retry_state: "RetryCallState") -> None: # noqa │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/tenacity/__init__.py:398 in <lambda> │
│ │
│ 395 │ │
│ 396 │ def _post_retry_check_actions(self, retry_state: "RetryCallState") -> None: │
│ 397 │ │ if not (self.iter_state.is_explicit_retry or self.iter_state.retry_run_result): │
│ ❱ 398 │ │ │ self._add_action_func(lambda rs: rs.outcome.result()) │
│ 399 │ │ │ return │
│ 400 │ │ │
│ 401 │ │ if self.after is not None: │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/concurrent/futures/_base.py:449 in result │
│ │
│ 446 │ │ │ │ if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: │
│ 447 │ │ │ │ │ raise CancelledError() │
│ 448 │ │ │ │ elif self._state == FINISHED: │
│ ❱ 449 │ │ │ │ │ return self.__get_result() │
│ 450 │ │ │ │ │
│ 451 │ │ │ │ self._condition.wait(timeout) │
│ 452 │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/concurrent/futures/_base.py:401 in __get_result │
│ │
│ 398 │ def __get_result(self): │
│ 399 │ │ if self._exception: │
│ 400 │ │ │ try: │
│ ❱ 401 │ │ │ │ raise self._exception │
│ 402 │ │ │ finally: │
│ 403 │ │ │ │ # Break a reference cycle with the exception in self._exception │
│ 404 │ │ │ │ self = None │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/paperqa/agents/main.py:247 in │
│ run_aviary_agent │
│ │
│ 244 │ │ │ │ │ reraise=True, │
│ 245 │ │ │ │ ): │
│ 246 │ │ │ │ │ with attempt: # Retrying if ToolSelector fails to select a tool │
│ ❱ 247 │ │ │ │ │ │ action = await agent(agent_state.messages, tools) │
│ 248 │ │ │ │ agent_state.messages = [*agent_state.messages, action] │
│ 249 │ │ │ │ if on_agent_action_callback: │
│ 250 │ │ │ │ │ await on_agent_action_callback(action, agent_state) │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/aviary/tools/utils.py:175 in __call__ │
│ │
│ 172 │ │ │ │ # in practice 'tool_calls' shows up too │
│ 173 │ │ │ │ expected_finish_reason.add("stop") │
│ 174 │ │ │
│ ❱ 175 │ │ model_response = await self._bound_acompletion( │
│ 176 │ │ │ messages=MessagesAdapter.dump_python( │
│ 177 │ │ │ │ messages, exclude_none=True, by_alias=True │
│ 178 │ │ │ ), │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:740 in acompletion │
│ │
│ 737 │ │ │ │ │ original_exception=e, │
│ 738 │ │ │ │ ) │
│ 739 │ │ │ ) │
│ ❱ 740 │ │ │ raise e │
│ 741 │ │
│ 742 │ async def _acompletion( │
│ 743 │ │ self, model: str, messages: List[Dict[str, str]], **kwargs │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:728 in acompletion │
│ │
│ 725 │ │ │ if request_priority is not None and isinstance(request_priority, int): │
│ 726 │ │ │ │ response = await self.schedule_acompletion(**kwargs) │
│ 727 │ │ │ else: │
│ ❱ 728 │ │ │ │ response = await self.async_function_with_fallbacks(**kwargs) │
│ 729 │ │ │ │
│ 730 │ │ │ return response │
│ 731 │ │ except Exception as e: │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3044 in │
│ async_function_with_fallbacks │
│ │
│ 3041 │ │ │ │ │ │ ) │
│ 3042 │ │ │ │ │ ) │
│ 3043 │ │ │ │
│ ❱ 3044 │ │ │ raise original_exception │
│ 3045 │ │
│ 3046 │ async def async_function_with_retries(self, *args, **kwargs): │
│ 3047 │ │ verbose_router_logger.debug( │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:2898 in │
│ async_function_with_fallbacks │
│ │
│ 2895 │ │ │ │ │ │ Context_Policy_Fallbacks={content_policy_fallbacks}", │
│ 2896 │ │ │ │ ) │
│ 2897 │ │ │ │
│ ❱ 2898 │ │ │ response = await self.async_function_with_retries(*args, **kwargs) │
│ 2899 │ │ │ verbose_router_logger.debug(f"Async Response: {response}") │
│ 2900 │ │ │ return response │
│ 2901 │ │ except Exception as e: │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3105 in │
│ async_function_with_retries │
│ │
│ 3102 │ │ │ ) │
│ 3103 │ │ │ │
│ 3104 │ │ │ # raises an exception if this error should not be retries │
│ ❱ 3105 │ │ │ self.should_retry_this_error( │
│ 3106 │ │ │ │ error=e, │
│ 3107 │ │ │ │ healthy_deployments=_healthy_deployments, │
│ 3108 │ │ │ │ all_deployments=_all_deployments, │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3243 in │
│ should_retry_this_error │
│ │
│ 3240 │ │ │ if ( │
│ 3241 │ │ │ │ _num_all_deployments <= 1 │
│ 3242 │ │ │ ): # if there is only 1 deployment for this model group then don't retry │
│ ❱ 3243 │ │ │ │ raise error # then raise error │
│ 3244 │ │ │
│ 3245 │ │ # Do not retry if there are no healthy deployments │
│ 3246 │ │ # just raise the error │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3088 in │
│ async_function_with_retries │
│ │
│ 3085 │ │ │ │ │ message=f"This is a mock exception for model={model_group}, to │
│ trigger a rate limit error.", │
│ 3086 │ │ │ │ ) │
│ 3087 │ │ │ # if the function call is successful, no exception will be raised and we'll │
│ break out of the loop │
│ ❱ 3088 │ │ │ response = await self.make_call(original_function, *args, **kwargs) │
│ 3089 │ │ │ │
│ 3090 │ │ │ return response │
│ 3091 │ │ except Exception as e: │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:3181 in make_call │
│ │
│ 3178 │ │ Handler for making a call to the .completion()/.embeddings() functions. │
│ 3179 │ │ """ │
│ 3180 │ │ model_group = kwargs.get("model") │
│ ❱ 3181 │ │ response = await original_function(*args, **kwargs) │
│ 3182 │ │ ## PROCESS RESPONSE HEADERS │
│ 3183 │ │ await self.set_response_headers(response=response, model_group=model_group) │
│ 3184 │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:879 in _acompletion │
│ │
│ 876 │ │ │ ) │
│ 877 │ │ │ if model_name is not None: │
│ 878 │ │ │ │ self.fail_calls[model_name] += 1 │
│ ❱ 879 │ │ │ raise e │
│ 880 │ │
│ 881 │ async def abatch_completion( │
│ 882 │ │ self, │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/router.py:851 in _acompletion │
│ │
│ 848 │ │ │ │ await self.async_routing_strategy_pre_call_checks( │
│ 849 │ │ │ │ │ deployment=deployment, logging_obj=logging_obj │
│ 850 │ │ │ │ ) │
│ ❱ 851 │ │ │ │ response = await _response │
│ 852 │ │ │ │
│ 853 │ │ │ ## CHECK CONTENT FILTER ERROR ## │
│ 854 │ │ │ if isinstance(response, ModelResponse): │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/utils.py:1573 in wrapper_async │
│ │
│ 1570 │ │ │ │ │ else: │
│ 1571 │ │ │ │ │ │ kwargs["model"] = context_window_fallback_dict[model] │
│ 1572 │ │ │ │ │ return await original_function(*args, **kwargs) │
│ ❱ 1573 │ │ │ raise e │
│ 1574 │ │
│ 1575 │ is_coroutine = inspect.iscoroutinefunction(original_function) │
│ 1576 │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/utils.py:1388 in wrapper_async │
│ │
│ 1385 │ │ │ │ │ │ │ ).start() │
│ 1386 │ │ │ │ │ │ │ return final_embedding_cached_response │
│ 1387 │ │ │ # MODEL CALL │
│ ❱ 1388 │ │ │ result = await original_function(*args, **kwargs) │
│ 1389 │ │ │ end_time = datetime.datetime.now() │
│ 1390 │ │ │ if "stream" in kwargs and kwargs["stream"] is True: │
│ 1391 │ │ │ │ if ( │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/main.py:457 in acompletion │
│ │
│ 454 │ │ return response │
│ 455 │ except Exception as e: │
│ 456 │ │ custom_llm_provider = custom_llm_provider or "openai" │
│ ❱ 457 │ │ raise exception_type( │
│ 458 │ │ │ model=model, │
│ 459 │ │ │ custom_llm_provider=custom_llm_provider, │
│ 460 │ │ │ original_exception=e, │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mappi │
│ ng_utils.py:2033 in exception_type │
│ │
│ 2030 │ │ # don't let an error with mapping interrupt the user from receiving an error │
│ from the llm api calls │
│ 2031 │ │ if exception_mapping_worked: │
│ 2032 │ │ │ setattr(e, "litellm_response_headers", litellm_response_headers) │
│ ❱ 2033 │ │ │ raise e │
│ 2034 │ │ else: │
│ 2035 │ │ │ for error_type in litellm.LITELLM_EXCEPTION_TYPES: │
│ 2036 │ │ │ │ if isinstance(e, error_type): │
│ │
│ /home/user/miniforge3/envs/pq/lib/python3.11/site-packages/litellm/litellm_core_utils/exception_mappi │
│ ng_utils.py:358 in exception_type │
│ │
│ 355 │ │ │ │ │ │ ) │
│ 356 │ │ │ │ │ elif original_exception.status_code == 401: │
│ 357 │ │ │ │ │ │ exception_mapping_worked = True │
│ ❱ 358 │ │ │ │ │ │ raise AuthenticationError( │
│ 359 │ │ │ │ │ │ │ message=f"AuthenticationError: {exception_provider} - │
│ {message}", │
│ 360 │ │ │ │ │ │ │ llm_provider=custom_llm_provider, │
│ 361 │ │ │ │ │ │ │ model=model, │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401
- {'error': {'message': 'Incorrect API key provided: sk-no-ke******ired. You can find your API key at
https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code':
'invalid_api_key'}}
Received Model Group=gpt-4o-2024-08-06
Available Model Group Fallbacks=None
I have the same problem. I tried to run with ollama/llama3.2 as per example in the docs. I am getting the same error message, looks like the local model is not being used at all - even though it is specified in the settings as per docs. My error message is:
Received Model Group=gpt-4o-2024-08-06 Available Model Group Fallbacks=None LiteLLM Retried: 2 times, LiteLLM Max Retries: 3
Same problem here, trying to run paperqa locally for the last month and it is pulling gpt-4o-2024-xx-xx model instead of local LLM. I tryed running llama3.1, llama3.2 and I have no problem executing the example from ollama to add documents to storage and ask questions. Paper-qa seems to connect to ollama server, gpu has the corresponding process, but the error is the one above. Could someone who has a working example, provide their virtual environment requirement list and IDE description, whether it is a notebook, a plain py or whatever.
Could you please share your Settings configuration? And the corresponding ollama environment configuration required
My running example is python 3.12.3, paperqa 5.0.10. @victorconka your code works in my setup except I commented out import nest_asyncio and nest_asyncio.apply() as I have this not yet available in my setup. But I experienced that, as no folder containing papers is specified, paperqa checks all available files (including .py files) and produces errors if file is not a paper in format .pdf. It also seems that it is the same in your case - at least the error-messages point in this direction.
@grg-ffb Thank you so much for trying out my code and providing your setup. I have just set up the environment with python==3.12 (i got 3.12.7), paperqa==5.0.10 and I've finally been able to get this working. I have a papers folder, containing txt files (converted from pdf files), I did not specify anywhere in the code papers folder path, so it is picking up all the txt files but does not seem to read any .py, nevertheless you can specify "paper_directory="./papers" " within ask Settings attribute. I get document parsing errors on the first run, but it is not a problem and I still get an answer both in the console and the "answer" object. By the way, I am running a remote ubuntu 22.04 with NVidia 3090 with ollama installed as described in their wiki:
curl -fsSL https://ollama.com/install.sh | sh
ollama pull mxbai-embed-large
ollama pull llama3.1
ollama pull llama3.2
tree .
.
├── ollama-ask.ipynb
├── papers
│ ├── 1.- paper.txt
│ ├── 2.- paper.txt
│ ├── 3.- paper.txt
│ └── 4.- paper.txt
├── pq.ipynb
└── questions.py
# jupyter requires asyncio to execute paperqa
import nest_asyncio
nest_asyncio.apply()
from paperqa import Settings, ask
import os
# set up openai key to stop openai lib from complaining
api_key="sk-no-key-required"
os.environ['OPENAI_API_KEY'] = api_key
# question list
question_list = [
"test q1?",
"test q2?",
"test q3?",
]
from questions import *
model = "ollama/llama3.1" # "ollama/llama3.2"
embedding = "ollama/mxbai-embed-large" # "nomic-embed-text"
local_llm_config = {
"model_list": [
{
"model_name": model,
"litellm_params": {
"model": model,
"api_base": "http://localhost:11434",
},
}
]
}
answer = ask(
question_list[0],
settings=Settings(
llm=model,
llm_config=local_llm_config,
summary_llm=model,
summary_llm_config=local_llm_config,
embedding=embedding,
),
)
P.S. I decided to give a try to the latest paper-qa==5.2.1 version and it has the problem as mentioned in this issue.
Hello,
This issue is a redundant problem related to embedding model. Setting a local embedding model is problematic.
I have no solution for ask
function but une Doc().query
method instead. See her : https://github.com/Future-House/paper-qa/issues/541.
Best regards.
I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,
from paperqa import Settings, ask local_llm_config = dict( model_list=[ dict( model_name="Qwen2.5-14B-Instruct", litellm_params=dict( model="Qwen2.5-14B-Instruct", api_base="http://127.0.0.1:8000/v1", api_key="EMPTY", temperature=0.7, max_tokens=512, ), ) ] ) answer = ask( "What manufacturing challenges are unique to bispecific antibodies?", settings=Settings( llm="Qwen2.5-14B-Instruct", llm_config=local_llm_config, summary_llm="Qwen2.5-14B-Instruct", summary_llm_config=local_llm_config, ), )
I got the following error:
Traceback (most recent call last): File "/home/huzheyu/project/paper/test/qa.py", line 22, in <module> answer = ask( ^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/__init__.py", line 102, in ask return get_loop().run_until_complete( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 67, in agent_query response = await run_agent(docs, query, agent_type, **runner_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/agents/main.py", line 119, in run_agent elif tool_selector_or_none := query.settings.make_aviary_tool_selector(agent_type): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/settings.py", line 719, in make_aviary_tool_selector acompletion=self.get_agent_llm().router.acompletion, ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/paperqa/llms.py", line 610, in router self._router = litellm.Router( ^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 352, in __init__ self.set_model_list(model_list) File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4069, in set_model_list self._create_deployment( File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 3994, in _create_deployment deployment = self._add_deployment(deployment=deployment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router.py", line 4158, in _add_deployment set_client( File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/litellm/router_utils/client_initalization_utils.py", line 441, in set_client _client = openai.AsyncOpenAI( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huzheyu/miniconda3/envs/paper/lib/python3.12/site-packages/openai/_client.py", line 319, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.
Me too.
Hello,
This issue is a redundant problem related to embedding model. Setting a local embedding model is problematic.
I have no solution for
ask
function but uneDoc().query
method instead. See her : #541.Best regards.
Thank you for your answer. I will try to add the Embedding model later. If this method works, I suggest you modify the example. Thank you again.
We've added a new feature to use the local sentence transformers library, which may be an easier way than trying to get litellm configured correctly for the using local embeddings:
@whitead This is awesome and the sentence transformers is indeed easier to setup for embeddings.
That said, in version 5.3.0 there are still plenty of errors as reported above for the main LLM model using either Ollama or llamafile as source as it still tries to connect to openai regardless.
Script:
from paperqa import Settings, ask
local_llm_config = dict( model_list=[ dict( model_name=f"openai/my-llm-model", litellm_params=dict( model=f"openai/my-llm-model", api_base="http://localhost:8080/v1", api_key="sk-no-key-required", temperature=0.1, frequency_penalty=1.5, max_tokens=1024, ), ) ] )
answer = ask( "What manufacturing challenges are unique to bispecific antibodies?", settings=Settings( llm="my-llm-model", llm_config=local_llm_config, summary_llm="my-llm-model", summary_llm_config=local_llm_config, temperature=0.5, paper_directory="my_papers" ), )
Error start:
[15:10:04] Agent <aviary.tools.utils.ToolSelector object at 0x10f5d1010> failed.
Error end:
AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided:
sk-no-ke******ired. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
Received Model Group=gpt-4o-2024-08-06
Available Model Group Fallbacks=None
[15:10:05] Answer:
Hi, I have the same problem. As far as I can tell, the
agent: AgentSettings = AgentSettings
in the Settings Object is still set to OpenAI. I can not change these settings, (see settings.py):
class AgentSettings(BaseModel):
model_config = ConfigDict(extra="forbid")
agent_llm: str = Field(
default="gpt-4o-2024-08-06",
description="Model to use for agent",
)
Could this be the cause of the issue? Kind Regards, Robert
source as it still tries to connect to openai regardless
@afonsoguerra and @prof-robert, if you want to use a local model for everything, you'll need to configure all places with models: settings.llm
, settings.summary_llm
, settings.agent.agent_llm
, settings.embedding
If you configure all of these places, and still see API calls going to OpenAI, let us know 👌
from paperqa import Settings, ask
import os
os.environ['OPENAI_API_KEY'] = "ollama"
local_llm_config = dict(
model_list=[
dict(
model_name='ollama/llama3.2',
litellm_params=dict(
model='ollama/llama3.2',
api_base="http://100.*8*.**8.1**:11434",
),
)
]
)
answer = ask(
"What is RAG?",
settings=Settings(
llm='ollama/llama3.2',
llm_config=local_llm_config,
summary_llm='ollama/llama3.2',
summary_llm_config=local_llm_config,
embedding='ollama/mxbai-embed-large',
paper_directory="my_papers"
),
)
➜ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py"
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
[03:43:46] Agent <aviary.tools.utils.ToolSelector object at 0x740f48d363c0> failed.
╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:867 in acompletion │
│ │
│ 864 │ │ │ │ │ }, │
│ 865 │ │ │ │ ) │
│ 866 │ │ │ │ │
│ ❱ 867 │ │ │ │ headers, response = await self.make_openai_chat_completion_request( │
│ 868 │ │ │ │ │ openai_aclient=openai_aclient, data=data, timeout=timeout │
│ 869 │ │ │ │ ) │
│ 870 │ │ │ │ stringified_response = response.model_dump() │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:580 in make_openai_chat_completion_request │
│ │
│ 577 │ │ │ response = raw_response.parse() │
│ 578 │ │ │ return headers, response │
│ 579 │ │ except Exception as e: │
│ ❱ 580 │ │ │ raise e │
│ 581 │ │
│ 582 │ def make_sync_openai_chat_completion_request( │
│ 583 │ │ self, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:568 in make_openai_chat_completion_request │
│ │
│ 565 │ │ """ │
│ 566 │ │ try: │
│ 567 │ │ │ raw_response = ( │
│ ❱ 568 │ │ │ │ await openai_aclient.chat.completions.with_raw_response.create( │
│ 569 │ │ │ │ │ **data, timeout=timeout │
│ 570 │ │ │ │ ) │
│ 571 │ │ │ ) │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_legacy_response.py:370 in wrapped │
│ │
│ 367 │ │ │
│ 368 │ │ kwargs["extra_headers"] = extra_headers │
│ 369 │ │ │
│ ❱ 370 │ │ return cast(LegacyAPIResponse[R], await func(*args, **kwargs)) │
│ 371 │ │
│ 372 │ return wrapped │
│ 373 │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/resources/chat/completions.py:1412 in create │
│ │
│ 1409 │ │ timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN, │
│ 1410 │ ) -> ChatCompletion | AsyncStream[ChatCompletionChunk]: │
│ 1411 │ │ validate_response_format(response_format) │
│ ❱ 1412 │ │ return await self._post( │
│ 1413 │ │ │ "/chat/completions", │
│ 1414 │ │ │ body=await async_maybe_transform( │
│ 1415 │ │ │ │ { │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1821 in post │
│ │
│ 1818 │ │ opts = FinalRequestOptions.construct( │
│ 1819 │ │ │ method="post", url=path, json_data=body, files=await │
│ async_to_httpx_files(files), **options │
│ 1820 │ │ ) │
│ ❱ 1821 │ │ return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) │
│ 1822 │ │
│ 1823 │ async def patch( │
│ 1824 │ │ self, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1515 in request │
│ │
│ 1512 │ │ stream_cls: type[_AsyncStreamT] | None = None, │
│ 1513 │ │ remaining_retries: Optional[int] = None, │
│ 1514 │ ) -> ResponseT | _AsyncStreamT: │
│ ❱ 1515 │ │ return await self._request( │
│ 1516 │ │ │ cast_to=cast_to, │
│ 1517 │ │ │ options=options, │
│ 1518 │ │ │ stream=stream, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/openai/_base_client.py:1616 in _request │
│ │
│ 1613 │ │ │ │ await err.response.aread() │
│ 1614 │ │ │ │
│ 1615 │ │ │ log.debug("Re-raising status error") │
│ ❱ 1616 │ │ │ raise self._make_status_error_from_response(err.response) from None │
│ 1617 │ │ │
│ 1618 │ │ return await self._process_response( │
│ 1619 │ │ │ cast_to=cast_to, │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at
https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
During handling of the above exception, another exception occurred:
╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/main.py:435 in acompletion │
│ │
│ 432 │ │ │ │ │ response = ModelResponse(**init_response) │
│ 433 │ │ │ │ response = init_response │
│ 434 │ │ │ elif asyncio.iscoroutine(init_response): │
│ ❱ 435 │ │ │ │ response = await init_response │
│ 436 │ │ │ else: │
│ 437 │ │ │ │ response = init_response # type: ignore │
│ 438 │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/llms/OpenAI/openai.py:898 in acompletion │
│ │
│ 895 │ │ │ │ if error_headers is None and exception_response: │
│ 896 │ │ │ │ │ error_headers = getattr(exception_response, "headers", None) │
│ 897 │ │ │ │ │
│ ❱ 898 │ │ │ │ raise OpenAIError( │
│ 899 │ │ │ │ │ status_code=status_code, message=str(e), headers=error_headers │
│ 900 │ │ │ │ ) │
│ 901 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
OpenAIError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at
https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
During handling of the above exception, another exception occurred:
╭────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/paperqa/agents/main.py:242 in run_aviary_agent │
│ │
│ 239 │ │ │ │ │ ) │
│ 240 │ │ │ │ │ return env.state.answer, AgentStatus.TRUNCATED │
│ 241 │ │ │ │ agent_state.messages += obs │
│ ❱ 242 │ │ │ │ for attempt in Retrying( │
│ 243 │ │ │ │ │ stop=stop_after_attempt(5), │
│ 244 │ │ │ │ │ retry=retry_if_exception_type(MalformedMessageError), │
│ 245 │ │ │ │ │ before_sleep=before_sleep_log(logger, logging.WARNING), │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:443 in __iter__ │
│ │
│ 440 │ │ │
│ 441 │ │ retry_state = RetryCallState(self, fn=None, args=(), kwargs={}) │
│ 442 │ │ while True: │
│ ❱ 443 │ │ │ do = self.iter(retry_state=retry_state) │
│ 444 │ │ │ if isinstance(do, DoAttempt): │
│ 445 │ │ │ │ yield AttemptManager(retry_state=retry_state) │
│ 446 │ │ │ elif isinstance(do, DoSleep): │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:376 in iter │
│ │
│ 373 │ │ self._begin_iter(retry_state) │
│ 374 │ │ result = None │
│ 375 │ │ for action in self.iter_state.actions: │
│ ❱ 376 │ │ │ result = action(retry_state) │
│ 377 │ │ return result │
│ 378 │ │
│ 379 │ def _begin_iter(self, retry_state: "RetryCallState") -> None: # noqa │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/tenacity/__init__.py:398 in <lambda> │
│ │
│ 395 │ │
│ 396 │ def _post_retry_check_actions(self, retry_state: "RetryCallState") -> None: │
│ 397 │ │ if not (self.iter_state.is_explicit_retry or self.iter_state.retry_run_result): │
│ ❱ 398 │ │ │ self._add_action_func(lambda rs: rs.outcome.result()) │
│ 399 │ │ │ return │
│ 400 │ │ │
│ 401 │ │ if self.after is not None: │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/concurrent/futures/_base.py:449 in result │
│ │
│ 446 │ │ │ │ if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: │
│ 447 │ │ │ │ │ raise CancelledError() │
│ 448 │ │ │ │ elif self._state == FINISHED: │
│ ❱ 449 │ │ │ │ │ return self.__get_result() │
│ 450 │ │ │ │ │
│ 451 │ │ │ │ self._condition.wait(timeout) │
│ 452 │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/concurrent/futures/_base.py:401 in __get_result │
│ │
│ 398 │ def __get_result(self): │
│ 399 │ │ if self._exception: │
│ 400 │ │ │ try: │
│ ❱ 401 │ │ │ │ raise self._exception │
│ 402 │ │ │ finally: │
│ 403 │ │ │ │ # Break a reference cycle with the exception in self._exception │
│ 404 │ │ │ │ self = None │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/paperqa/agents/main.py:249 in run_aviary_agent │
│ │
│ 246 │ │ │ │ │ reraise=True, │
│ 247 │ │ │ │ ): │
│ 248 │ │ │ │ │ with attempt: # Retrying if ToolSelector fails to select a tool │
│ ❱ 249 │ │ │ │ │ │ action = await agent(agent_state.messages, tools) │
│ 250 │ │ │ │ agent_state.messages = [*agent_state.messages, action] │
│ 251 │ │ │ │ if on_agent_action_callback: │
│ 252 │ │ │ │ │ await on_agent_action_callback(action, agent_state) │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/aviary/tools/utils.py:175 in __call__ │
│ │
│ 172 │ │ │ │ # in practice 'tool_calls' shows up too │
│ 173 │ │ │ │ expected_finish_reason.add("stop") │
│ 174 │ │ │
│ ❱ 175 │ │ model_response = await self._bound_acompletion( │
│ 176 │ │ │ messages=MessagesAdapter.dump_python( │
│ 177 │ │ │ │ messages, exclude_none=True, by_alias=True │
│ 178 │ │ │ ), │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:740 in acompletion │
│ │
│ 737 │ │ │ │ │ original_exception=e, │
│ 738 │ │ │ │ ) │
│ 739 │ │ │ ) │
│ ❱ 740 │ │ │ raise e │
│ 741 │ │
│ 742 │ async def _acompletion( │
│ 743 │ │ self, model: str, messages: List[Dict[str, str]], **kwargs │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:728 in acompletion │
│ │
│ 725 │ │ │ if request_priority is not None and isinstance(request_priority, int): │
│ 726 │ │ │ │ response = await self.schedule_acompletion(**kwargs) │
│ 727 │ │ │ else: │
│ ❱ 728 │ │ │ │ response = await self.async_function_with_fallbacks(**kwargs) │
│ 729 │ │ │ │
│ 730 │ │ │ return response │
│ 731 │ │ except Exception as e: │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3044 in async_function_with_fallbacks │
│ │
│ 3041 │ │ │ │ │ │ ) │
│ 3042 │ │ │ │ │ ) │
│ 3043 │ │ │ │
│ ❱ 3044 │ │ │ raise original_exception │
│ 3045 │ │
│ 3046 │ async def async_function_with_retries(self, *args, **kwargs): │
│ 3047 │ │ verbose_router_logger.debug( │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:2898 in async_function_with_fallbacks │
│ │
│ 2895 │ │ │ │ │ │ Context_Policy_Fallbacks={content_policy_fallbacks}", │
│ 2896 │ │ │ │ ) │
│ 2897 │ │ │ │
│ ❱ 2898 │ │ │ response = await self.async_function_with_retries(*args, **kwargs) │
│ 2899 │ │ │ verbose_router_logger.debug(f"Async Response: {response}") │
│ 2900 │ │ │ return response │
│ 2901 │ │ except Exception as e: │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3105 in async_function_with_retries │
│ │
│ 3102 │ │ │ ) │
│ 3103 │ │ │ │
│ 3104 │ │ │ # raises an exception if this error should not be retries │
│ ❱ 3105 │ │ │ self.should_retry_this_error( │
│ 3106 │ │ │ │ error=e, │
│ 3107 │ │ │ │ healthy_deployments=_healthy_deployments, │
│ 3108 │ │ │ │ all_deployments=_all_deployments, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3243 in should_retry_this_error │
│ │
│ 3240 │ │ │ if ( │
│ 3241 │ │ │ │ _num_all_deployments <= 1 │
│ 3242 │ │ │ ): # if there is only 1 deployment for this model group then don't retry │
│ ❱ 3243 │ │ │ │ raise error # then raise error │
│ 3244 │ │ │
│ 3245 │ │ # Do not retry if there are no healthy deployments │
│ 3246 │ │ # just raise the error │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3088 in async_function_with_retries │
│ │
│ 3085 │ │ │ │ │ message=f"This is a mock exception for model={model_group}, to │
│ trigger a rate limit error.", │
│ 3086 │ │ │ │ ) │
│ 3087 │ │ │ # if the function call is successful, no exception will be raised and we'll │
│ break out of the loop │
│ ❱ 3088 │ │ │ response = await self.make_call(original_function, *args, **kwargs) │
│ 3089 │ │ │ │
│ 3090 │ │ │ return response │
│ 3091 │ │ except Exception as e: │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:3181 in make_call │
│ │
│ 3178 │ │ Handler for making a call to the .completion()/.embeddings() functions. │
│ 3179 │ │ """ │
│ 3180 │ │ model_group = kwargs.get("model") │
│ ❱ 3181 │ │ response = await original_function(*args, **kwargs) │
│ 3182 │ │ ## PROCESS RESPONSE HEADERS │
│ 3183 │ │ await self.set_response_headers(response=response, model_group=model_group) │
│ 3184 │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:879 in _acompletion │
│ │
│ 876 │ │ │ ) │
│ 877 │ │ │ if model_name is not None: │
│ 878 │ │ │ │ self.fail_calls[model_name] += 1 │
│ ❱ 879 │ │ │ raise e │
│ 880 │ │
│ 881 │ async def abatch_completion( │
│ 882 │ │ self, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/router.py:851 in _acompletion │
│ │
│ 848 │ │ │ │ await self.async_routing_strategy_pre_call_checks( │
│ 849 │ │ │ │ │ deployment=deployment, logging_obj=logging_obj │
│ 850 │ │ │ │ ) │
│ ❱ 851 │ │ │ │ response = await _response │
│ 852 │ │ │ │
│ 853 │ │ │ ## CHECK CONTENT FILTER ERROR ## │
│ 854 │ │ │ if isinstance(response, ModelResponse): │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/utils.py:1573 in wrapper_async │
│ │
│ 1570 │ │ │ │ │ else: │
│ 1571 │ │ │ │ │ │ kwargs["model"] = context_window_fallback_dict[model] │
│ 1572 │ │ │ │ │ return await original_function(*args, **kwargs) │
│ ❱ 1573 │ │ │ raise e │
│ 1574 │ │
│ 1575 │ is_coroutine = inspect.iscoroutinefunction(original_function) │
│ 1576 │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/utils.py:1388 in wrapper_async │
│ │
│ 1385 │ │ │ │ │ │ │ ).start() │
│ 1386 │ │ │ │ │ │ │ return final_embedding_cached_response │
│ 1387 │ │ │ # MODEL CALL │
│ ❱ 1388 │ │ │ result = await original_function(*args, **kwargs) │
│ 1389 │ │ │ end_time = datetime.datetime.now() │
│ 1390 │ │ │ if "stream" in kwargs and kwargs["stream"] is True: │
│ 1391 │ │ │ │ if ( │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/main.py:457 in acompletion │
│ │
│ 454 │ │ return response │
│ 455 │ except Exception as e: │
│ 456 │ │ custom_llm_provider = custom_llm_provider or "openai" │
│ ❱ 457 │ │ raise exception_type( │
│ 458 │ │ │ model=model, │
│ 459 │ │ │ custom_llm_provider=custom_llm_provider, │
│ 460 │ │ │ original_exception=e, │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:2033 in exception_type │
│ │
│ 2030 │ │ # don't let an error with mapping interrupt the user from receiving an error │
│ from the llm api calls │
│ 2031 │ │ if exception_mapping_worked: │
│ 2032 │ │ │ setattr(e, "litellm_response_headers", litellm_response_headers) │
│ ❱ 2033 │ │ │ raise e │
│ 2034 │ │ else: │
│ 2035 │ │ │ for error_type in litellm.LITELLM_EXCEPTION_TYPES: │
│ 2036 │ │ │ │ if isinstance(e, error_type): │
│ │
│ /home/bwnjnoei/miniforge3/envs/paper-qa/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:358 in exception_type │
│ │
│ 355 │ │ │ │ │ │ ) │
│ 356 │ │ │ │ │ elif original_exception.status_code == 401: │
│ 357 │ │ │ │ │ │ exception_mapping_worked = True │
│ ❱ 358 │ │ │ │ │ │ raise AuthenticationError( │
│ 359 │ │ │ │ │ │ │ message=f"AuthenticationError: {exception_provider} - │
│ {message}", │
│ 360 │ │ │ │ │ │ │ llm_provider=custom_llm_provider, │
│ 361 │ │ │ │ │ │ │ model=model, │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AuthenticationError: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided:
ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
Received Model Group=gpt-4o-2024-08-06
Available Model Group Fallbacks=None
[03:43:47] Answer:
Hi @bwnjnOEI you are quite close, what you need to do is also configure settings.agent.agent_llm
. This is the LLM powering the agent that selects tools.
from paperqa import Settings, ask, AgentSettings
answer = ask(
...,
settings=Settings(
...,
agent=AgentSettings(
agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config
)
),
)
Hi @bwnjnOEI you are quite close, what you need to do is also configure
settings.agent.agent_llm
. This is the LLM powering the agent that selects tools.from paperqa import Settings, ask, AgentSettings answer = ask( ..., settings=Settings( ..., agent=AgentSettings( agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config ) ), )
I'm very grateful for your guidance, I modified the code according to your hint, and it worked.
from paperqa import Settings, ask
from paperqa.settings import AgentSettings
import os
os.environ['OPENAI_API_KEY'] = "ollama"
local_llm_config = dict(
model_list=[
dict(
model_name='ollama/llama3.2',
litellm_params=dict(
model='ollama/llama3.2',
api_base="http://100.*8*.**8.1**:11434",
),
)
]
)
answer = ask(
"What is RAG?",
settings=Settings(
llm='ollama/llama3.2',
llm_config=local_llm_config,
summary_llm='ollama/llama3.2',
summary_llm_config=local_llm_config,
embedding='ollama/mxbai-embed-large',
agent=AgentSettings(
agent_llm='ollama/llama3.2',
agent_llm_config=local_llm_config
),
paper_directory="my_papers"
),
)
❯ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py"
[04:09:47] Could not find cost for model ollama/llama3.2.
Starting paper search for 'RAG'.
paper_search for query 'RAG' and offset 0 returned 1 papers.
Status: Paper Count=1 | Relevant Papers=0 | Current Evidence=0 | Current Cost=$0.0000
[04:09:48] Could not find cost for model ollama/llama3.2.
gather_evidence starting for question 'What is RAG?'.
[04:09:53] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000
[04:09:54] Generating answer for 'What is RAG?'.
[04:09:56] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000
Answer: RAG stands for Retrieval-Augmented Text Generation, a research area that improves the quality of language models by grounding their generation
process on a set of retrieved materials such as relevant documents. This approach has gained increasing interest recently and aims to combine retrieval
and generation in a more integrated way.
The term RAG is also used interchangeably with Retrieval-Augmented Generation (RAG), which involves using a retrieval model to retrieve relevant
documents or snippets from a large corpus, and then using a generation model to generate text based on these retrieved snippets. This technique is used
in natural language processing (NLP) tasks to improve performance.
In the context of phrase segmentation, RAG stands for Phrase Segmentation Algorithm (RAG), which uses a forward maximum matching strategy to identify
phrases. The algorithm can be optimized using pre-trained DPR models and FAISS vector search toolkits, as mentioned in "COPY IS ALL YOU NEED." for
efficient phrase segmentation in large document sets.
The COG model, proposed in the same paper, is a text generation model that outperforms strong Transformer baselines in terms of generation quality. It
can be fine-tuned for different domains without requiring additional training. The model's performance is evaluated on various tasks, including domain
adaptation and enlarged phrase index settings.
Sources:
- Lan2023 pages 8-9: RAG stands for Retrieval-Augmented Text Generation.
- Lan2023 pages 11-12: RAG also refers to Retrieval-Augmented Generation (RAG).
- Lan2023 pages 13-15: RAG stands for Phrase Segmentation Algorithm (RAG) in the context of phrase segmentation.
嗨,你离得很近,你需要做的也是配置 。这是为选择工具的代理提供支持的 LLM。
settings.agent.agent_llm
from paperqa import Settings, ask, AgentSettings answer = ask( ..., settings=Settings( ..., agent=AgentSettings( agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config ) ), )
非常感谢您的指导,我根据您的提示修改了代码,并且它有效。
更新代码
from paperqa import Settings, ask from paperqa.settings import AgentSettings import os os.environ['OPENAI_API_KEY'] = "ollama" local_llm_config = dict( model_list=[ dict( model_name='ollama/llama3.2', litellm_params=dict( model='ollama/llama3.2', api_base="http://100.*8*.**8.1**:11434", ), ) ] ) answer = ask( "What is RAG?", settings=Settings( llm='ollama/llama3.2', llm_config=local_llm_config, summary_llm='ollama/llama3.2', summary_llm_config=local_llm_config, embedding='ollama/mxbai-embed-large', agent=AgentSettings( agent_llm='ollama/llama3.2', agent_llm_config=local_llm_config ), paper_directory="my_papers" ), )
运行结果
❯ python -u "/home/bwnjnoei/Workspace/pqa_deploy/pqa_deploy.py" [04:09:47] Could not find cost for model ollama/llama3.2. Starting paper search for 'RAG'. paper_search for query 'RAG' and offset 0 returned 1 papers. Status: Paper Count=1 | Relevant Papers=0 | Current Evidence=0 | Current Cost=$0.0000 [04:09:48] Could not find cost for model ollama/llama3.2. gather_evidence starting for question 'What is RAG?'. [04:09:53] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000 [04:09:54] Generating answer for 'What is RAG?'. [04:09:56] Status: Paper Count=1 | Relevant Papers=1 | Current Evidence=9 | Current Cost=$0.0000 Answer: RAG stands for Retrieval-Augmented Text Generation, a research area that improves the quality of language models by grounding their generation process on a set of retrieved materials such as relevant documents. This approach has gained increasing interest recently and aims to combine retrieval and generation in a more integrated way. The term RAG is also used interchangeably with Retrieval-Augmented Generation (RAG), which involves using a retrieval model to retrieve relevant documents or snippets from a large corpus, and then using a generation model to generate text based on these retrieved snippets. This technique is used in natural language processing (NLP) tasks to improve performance. In the context of phrase segmentation, RAG stands for Phrase Segmentation Algorithm (RAG), which uses a forward maximum matching strategy to identify phrases. The algorithm can be optimized using pre-trained DPR models and FAISS vector search toolkits, as mentioned in "COPY IS ALL YOU NEED." for efficient phrase segmentation in large document sets. The COG model, proposed in the same paper, is a text generation model that outperforms strong Transformer baselines in terms of generation quality. It can be fine-tuned for different domains without requiring additional training. The model's performance is evaluated on various tasks, including domain adaptation and enlarged phrase index settings. Sources: - Lan2023 pages 8-9: RAG stands for Retrieval-Augmented Text Generation. - Lan2023 pages 11-12: RAG also refers to Retrieval-Augmented Generation (RAG). - Lan2023 pages 13-15: RAG stands for Phrase Segmentation Algorithm (RAG) in the context of phrase segmentation.
Thank you very much for sharing, this is useful for me.
Yes, thanks, it works for me now too! btw. If you use the command-line version, this was the way I got it to workt with ollama:
pqa --llm "ollama/llama3.2" \
--summary_llm "ollama/llama3.2" \
--agent.agent_llm "ollama/llama3.2" \
--embedding "ollama/mxbai-embed-large" \
--llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
--summary_llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
--agent.agent_llm_config '{"model_list": [{"model_name": "ollama/llama3.2", "litellm_params": {"model": "ollama/llama3.2", "api_base": "http://localhost:11434"}}]}' \
--prompts.use_json False ask "Your question here"
Thanks @jamesbraza for the prompt reply, I thought it was something along those lines (because I could see my local llm doing something for both embedding and generation), but I couldn't find the details in the documentation of what I was missing! It's working now! Thanks also to @prof-robert for the command line version. I meant to investigate that at some point and this just saves me time :)
I deployed Qwen2.5-14B-Instruct on my local server and started llm correctly using vllm. But when I executed the sample code,
I got the following error:
It seems that the settings are invalid, and the program still prioritizes looking for openai_api instead of the local llm settings I passed. I want to know what caused this.