Closed netapy closed 5 months ago
@netapy can you try switching to the Router and let me know if this persists - https://docs.litellm.ai/docs/routing#fallbacks
@netapy can you try switching to the Router and let me know if this persists - https://docs.litellm.ai/docs/routing#fallbacks
Thanks for your answer.
model_list = [
{
"model_name": "azure/gpt-3.5-turbo",
"litellm_params": {
"model": "azure/gpt-35-turbo",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
},
},
{
"model_name": "together_ai/Qwen/Qwen1.5-14B-Chat",
"litellm_params": {
"model": "together_ai/Qwen/Qwen1.5-14B-Chat",
"api_key": os.getenv("TOGETHERAI_API_KEY"),
},
}
]
router = Router(model_list=model_list,
fallbacks=["azure/gpt-3.5-turbo"],
set_verbose=True,
num_retries=0,
timeout=3)
response = await router.acompletion(
model="together_ai/Qwen/Qwen1.5-14B-Chat",
fallbacks=["azure/gpt-35-turbo"],
temperature=0.24,
messages=[{'role': 'user', 'content': base_prompt.strip()}],
stream=False,
timeout=3,
max_retries=0
)
I still get
17:36:07 - LiteLLM Router:INFO: litellm.acompletion(model=together_ai/Qwen/Qwen1.5-14B-Chat) Exception Request timed out.
(also btw the timeout time is not 3 seconds as requested, it takes up to 10 seconds before giving the time out error)
thanks for raising this @netapy i'll test this on our my end and revert back with a fix for both acompletion + router
Any progress on this issue ? :)
Hey @netapy just tested your code. There's a mistake there
fallbacks=["azure/gpt-35-turbo"]
you need to set fallbacks like this:
fallbacks=[{"together_ai_model": ["azure-model"]}]
I just tested this code, and can confirm it works:
from litellm import Router
model_list = [
{
"model_name": "azure-model",
"litellm_params": {
"model": "azure/chatgpt-v-2",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
},
},
{
"model_name": "together_ai_model",
"litellm_params": {
"model": "together_ai/Qwen/Qwen1.5-14B-Chat",
"api_key": "bad-key",
},
}
]
router = Router(model_list=model_list,
fallbacks=[{"together_ai_model": ["azure-model"]}],
set_verbose=True,
num_retries=0,
timeout=3)
response = await router.acompletion(
model="together_ai_model",
temperature=0.24,
messages=[{'role': 'user', 'content': "Hey how's it going?"}],
stream=False,
timeout=3,
max_retries=0
)
Gotcha thanks @krrishdholakia
It means however that the documentation is incorrect or there's a bug : https://litellm.vercel.app/docs/completion/reliable_completions#switch-models
That doesn't work, but would actually be more practical to not have to use a router. Each completion requests should be able to have its own fallbacks defined as simple strings without a router.
What happened?
Currently TogetherAI's API is down on Qwen1.5-14B.
This works and falls back to gpt35
This doesn't work and stops at the timeout error.
Relevant log output
No response
Twitter / LinkedIn details
No response