BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.19k stars 1.42k forks source link

[Bug]: Fallbacks don't work with `acompletion` #1959

Closed netapy closed 5 months ago

netapy commented 7 months ago

What happened?

Currently TogetherAI's API is down on Qwen1.5-14B.

This works and falls back to gpt35

response = completion(
                model="together_ai/Qwen/Qwen1.5-14B-Chat",
                fallbacks=["azure/gpt-35-turbo", "together_ai/openchat/openchat-3.5-1210"],
                temperature=0.24,
                messages=[{'role': 'user', 'content': base_prompt.strip()}],
                stream=False,
                timeout=3,
                max_retries=0
            )

This doesn't work and stops at the timeout error.

response = await acompletion(
                model="together_ai/Qwen/Qwen1.5-14B-Chat",
                fallbacks=["azure/gpt-35-turbo", "together_ai/openchat/openchat-3.5-1210"],
                temperature=0.24,
                messages=[{'role': 'user', 'content': base_prompt.strip()}],
                stream=False,
                timeout=3,
                max_retries=0
            )

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 7 months ago

@netapy can you try switching to the Router and let me know if this persists - https://docs.litellm.ai/docs/routing#fallbacks

netapy commented 7 months ago

@netapy can you try switching to the Router and let me know if this persists - https://docs.litellm.ai/docs/routing#fallbacks

Thanks for your answer.

model_list = [
    {
        "model_name": "azure/gpt-3.5-turbo",
        "litellm_params": {
            "model": "azure/gpt-35-turbo",
            "api_key": os.getenv("AZURE_API_KEY"),
            "api_version": os.getenv("AZURE_API_VERSION"),
            "api_base": os.getenv("AZURE_API_BASE")
        },
    },
    {
        "model_name": "together_ai/Qwen/Qwen1.5-14B-Chat",
        "litellm_params": {
            "model": "together_ai/Qwen/Qwen1.5-14B-Chat",
            "api_key": os.getenv("TOGETHERAI_API_KEY"),
        },
    }
]

router = Router(model_list=model_list,
                fallbacks=["azure/gpt-3.5-turbo"],
                set_verbose=True,
                num_retries=0,
                timeout=3)

response = await router.acompletion(
                model="together_ai/Qwen/Qwen1.5-14B-Chat",  
                fallbacks=["azure/gpt-35-turbo"],
                temperature=0.24,
                messages=[{'role': 'user', 'content': base_prompt.strip()}],
                stream=False,
                timeout=3,
                max_retries=0
            )

I still get 17:36:07 - LiteLLM Router:INFO: litellm.acompletion(model=together_ai/Qwen/Qwen1.5-14B-Chat) Exception Request timed out.

(also btw the timeout time is not 3 seconds as requested, it takes up to 10 seconds before giving the time out error)

krrishdholakia commented 7 months ago

thanks for raising this @netapy i'll test this on our my end and revert back with a fix for both acompletion + router

netapy commented 5 months ago

Any progress on this issue ? :)

krrishdholakia commented 5 months ago

Hey @netapy just tested your code. There's a mistake there

fallbacks=["azure/gpt-35-turbo"]

you need to set fallbacks like this:

fallbacks=[{"together_ai_model": ["azure-model"]}]

I just tested this code, and can confirm it works:

Screenshot

Screenshot 2024-04-04 at 6 58 13 AM

Code

from litellm import Router 

model_list = [
    {
        "model_name": "azure-model",
        "litellm_params": {
            "model": "azure/chatgpt-v-2",
            "api_key": os.getenv("AZURE_API_KEY"),
            "api_version": os.getenv("AZURE_API_VERSION"),
            "api_base": os.getenv("AZURE_API_BASE")
        },
    },
    {
        "model_name": "together_ai_model",
        "litellm_params": {
            "model": "together_ai/Qwen/Qwen1.5-14B-Chat",
            "api_key": "bad-key",
        },
    }
]

router = Router(model_list=model_list,
                fallbacks=[{"together_ai_model": ["azure-model"]}],
                set_verbose=True,
                num_retries=0,
                timeout=3)

response = await router.acompletion(
                model="together_ai_model",  
                temperature=0.24,
                messages=[{'role': 'user', 'content': "Hey how's it going?"}],
                stream=False,
                timeout=3,
                max_retries=0
            )
netapy commented 5 months ago

Gotcha thanks @krrishdholakia

It means however that the documentation is incorrect or there's a bug : https://litellm.vercel.app/docs/completion/reliable_completions#switch-models

That doesn't work, but would actually be more practical to not have to use a router. Each completion requests should be able to have its own fallbacks defined as simple strings without a router.