Open bgeneto opened 1 month ago
Has anyone confirmed it? It's a core function, at least for self-hosted Ollama models where failures tend to be more frequent!
I’m facing a similar issue while using the API. Here’s the call I’m making:
curl --location 'http://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: XXXXX' \
--data '{
"stream": true,
"model": "gemma2:9b",
"messages": [
{
"role": "user",
"content": "How can I get goto folder option while upload box in mac os?"
}
],
"fallbacks": ["gpt-4o-mini"]
}'
Response:
curl: (18) transfer closed with outstanding read data remaining
Expected Behavior: The request should gracefully fall back to the gpt-4o-mini model when the primary model fails.
This version is concise, formatted for clarity, and outlines the problem with expected behavior for better understanding.Expected Behaviour it should fallback to gpt-4o-mini
What happened?
When using litellm to interact with Ollama models and fallbacks are configured, the fallback mechanism does not function correctly when the stream=True option is used.
Steps to Reproduce
litellm
with one Ollama model (or more in load balance) as the primary model and a fallback model (e.g., another Ollama model or an OpenAI model). Relevantconfig.yaml
:router_settings: num_retries: 0 retry_after: 0 allowed_fails: 1 cooldown_time: 300 fallbacks:
litellm_settings: json_logs: true
and also triggers the TypeError exception shown in PR #6281
Expected behavior
When a request triggers the fallback logic, even with
stream=True
, the fallback model should be seamlessly invoked, and the response should be streamed from the fallback model.Environment:
litellm
version: 1.49.6 (from 2024-10-17)Notes:
stream = False
Relevant log output
Twitter / LinkedIn details
No response