BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.38k stars 1.17k forks source link

[Bug]: LiteLLM[proxy] returns no model available errors #2487

Closed kmyczkowska-hypatos closed 3 months ago

kmyczkowska-hypatos commented 3 months ago

What happened?

The problem happened while performing load tests (1100 prompts) on the same model, with 0 retries. The following HTTP status codes were returned by LiteLLM[proxy]: 200: 864 occurrences 429: 219 occurrences 500: 17 occurrences

While 200 and 429 are expected, the 500 isn't. The error message doesn't tell why models aren't available. If it's because of quota, why not return 429? It would be good to have information if the error is recoverable. The error is repeatable.

Relevant log output

Example response content returned with http 500:

{'error': {'message': 'No models available\n\nTraceback (most recent call last):\n  File "/usr/local/lib/python3.9/site-packages/litellm/proxy/proxy_server.py", line 2821, in chat_completion\n    responses = await asyncio.gather(\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 395, in acompletion\n    raise e\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 391, in acompletion\n    response = await self.async_function_with_fallbacks(**kwargs)\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 1176, in async_function_with_fallbacks\n    raise original_exception\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 1099, in async_function_with_fallbacks\n    response = await self.async_function_with_retries(*args, **kwargs)\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 1286, in async_function_with_retries\n    raise original_exception\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 1193, in async_function_with_retries\n    response = await original_function(*args, **kwargs)\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 478, in _acompletion\n    raise e\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 403, in _acompletion\n    deployment = self.get_available_deployment(\n  File "/usr/local/lib/python3.9/site-packages/litellm/router.py", line 2172, in get_available_deployment\n    raise ValueError("No models available")\nValueError: No models available\n', 'type': 'None', 'param': 'None', 'code': 500}}

server log:
INFO:     127.0.0.1:53773 - "POST /chat/completions HTTP/1.1" 500 Internal Server Error
LiteLLM Proxy: Inside Proxy Logging Pre-call hook!
Inside Max Parallel Request Pre-Call Hook
Inside Max Budget Limiter Pre-Call Hook
get cache: cache key: None_user_api_key_user_id; local_only: False
in_memory_result: None
get cache: cache result: None
Inside Cache Control Check Pre-Call Hook
LiteLLM Proxy: final data being sent to completion call: {'model': 'gpt-35-turbo-16k-qt', 'messages': [{'role': 'user', 'content': 'What is the latin name for the fox?'}], 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'python-requests/2.31.0', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '114'}, 'body': {'model': 'gpt-35-turbo-16k-qt', 'messages': [{'role': 'user', 'content': 'What is the latin name for the fox?'}]}}, 'metadata': {'user_api_key': None, 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'python-requests/2.31.0', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '114'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions'}, 'request_timeout': 600}
get cache: cache key: 11-13:cooldown_models; local_only: False
in_memory_result: ['2caf400e-3a7a-4f22-9d30-f0ce7ae4edac']
get cache: cache result: ['2caf400e-3a7a-4f22-9d30-f0ce7ae4edac']
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/litellm/proxy/proxy_server.py", line 2825, in chat_completion
    responses = await asyncio.gather(
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 399, in acompletion
    raise e
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 395, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1217, in async_function_with_fallbacks
    raise original_exception
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1140, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1327, in async_function_with_retries
    raise original_exception
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1234, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 482, in _acompletion
    raise e
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 407, in _acompletion
    deployment = self.get_available_deployment(
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 2207, in get_available_deployment
    raise ValueError("No models available")
ValueError: No models available
INFO:     127.0.0.1:53774 - "POST /chat/completions HTTP/1.1" 500 Internal Server Error
LiteLLM Proxy: Inside Proxy Logging Pre-call hook!
Inside Max Parallel Request Pre-Call Hook
Inside Max Budget Limiter Pre-Call Hook
get cache: cache key: None_user_api_key_user_id; local_only: False
in_memory_result: None
get cache: cache result: None
Inside Cache Control Check Pre-Call Hook
LiteLLM Proxy: final data being sent to completion call: {'model': 'gpt-35-turbo-16k-qt', 'messages': [{'role': 'user', 'content': 'What is the latin name for the fox?'}], 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'python-requests/2.31.0', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '114'}, 'body': {'model': 'gpt-35-turbo-16k-qt', 'messages': [{'role': 'user', 'content': 'What is the latin name for the fox?'}]}}, 'metadata': {'user_api_key': None, 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'user-agent': 'python-requests/2.31.0', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'connection': 'keep-alive', 'content-type': 'application/json', 'content-length': '114'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions'}, 'request_timeout': 600}
get cache: cache key: 11-13:cooldown_models; local_only: False
in_memory_result: ['2caf400e-3a7a-4f22-9d30-f0ce7ae4edac']
get cache: cache result: ['2caf400e-3a7a-4f22-9d30-f0ce7ae4edac']
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/litellm/proxy/proxy_server.py", line 2825, in chat_completion
    responses = await asyncio.gather(
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 399, in acompletion
    raise e
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 395, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1217, in async_function_with_fallbacks
    raise original_exception
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1140, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1327, in async_function_with_retries
    raise original_exception
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 1234, in async_function_with_retries
    response = await original_function(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 482, in _acompletion
    raise e
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 407, in _acompletion
    deployment = self.get_available_deployment(
  File "/usr/local/lib/python3.10/site-packages/litellm/router.py", line 2207, in get_available_deployment
    raise ValueError("No models available")
ValueError: No models available
INFO:     127.0.0.1:53775 - "POST /chat/completions HTTP/1.1" 500 Internal Server Error

Twitter / LinkedIn details

No response

ishaan-jaff commented 3 months ago

Good point - will work on this now

ishaan-jaff commented 3 months ago

PR here @kmyczkowska-hypatos https://github.com/BerriAI/litellm/pull/2493

ishaan-jaff commented 3 months ago

@kmyczkowska-hypatos can we setup a direct Slack connect with your team to address your issues faster ?

What's the best email to send a slack connect to?

Here's my linkedin if you want to dm me there https://www.linkedin.com/in/reffajnaahsi/