BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
14.11k stars 1.67k forks source link

[Feature]: Support retry policies when calling completion() / text_completion() without requiring Router #6623

Open dbczumar opened 2 weeks ago

dbczumar commented 2 weeks ago

The Feature

Support retry policies when calling completion() / text_completion() without requiring Router. Example:

import litellm
from litellm import RetryPolicy

retry_policy = RetryPolicy(
    TimeoutErrorRetries=num_retries,
    RateLimitErrorRetries=num_retries,
    InternalServerErrorRetries=num_retries,
    # We don't retry on errors that are unlikely to be transient
    # (e.g. bad request, invalid auth credentials)
    BadRequestErrorRetries=0,
    AuthenticationErrorRetries=0,
    ContentPolicyViolationErrorRetries=0,
)

litellm.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Example content"}],
    retry_strategy="exponential_backoff_retry",
    retry_policy=retry_policy,
)

Motivation, pitch

The DSPy library (https://github.com/stanfordnlp/dspy) depends on LiteLLM for issuing LLM calls. When these calls fail due to transient network errors or rate limiting, we want to retry with exponential backoff. However, when these calls fail due to user error (e.g. bad API keys, malformed requests), we want to fail fast.

DSPy users configure LLM keys and parameters using constructor arguments to the dspy.LM class (and optionally be setting environment variables like `OPENAI_API_KEY'), for example:

llm = dspy.LM(model="openai/gpt-4o-mini", api_key="<my key>", model_type="chat")
llm("Who invented deep learing?")

# Env var alternative
os.environ["OPENAI_API_KEY"] = "<my_key>"
llm = dspy.LM(model="openai/gpt-4o-mini", model_type="chat")
llm("Who invented deep learnng?")

DSPy currently wraps litellm.completion() and litellm.text_completion() to implement this interface. See https://github.com/stanfordnlp/dspy/blob/8bc3439052eb80ba4e5ba340c348a6e3b2c94d7c/dspy/clients/lm.py#L78-L87 / https://github.com/stanfordnlp/dspy/blob/8bc3439052eb80ba4e5ba340c348a6e3b2c94d7c/dspy/clients/lm.py#L166-L216. Currently, these interfaces don't support specifying a retry policy.

We've attempted to work around this by constructing a Router internally, but Router construction requires us to fetch the api key and base and pass them to a model_list (due to OpenAI / Azure OpenAI initialization - https://github.com/BerriAI/litellm/blob/45ff74ae81e331412370cd7436816559fd6298da/litellm/router.py#L3999-L4001), which is difficult if those keys are stored in environment variables.

Twitter / LinkedIn details

No response

dbczumar commented 2 weeks ago

Hi @krrishdholakia, can you advise regarding how to support configuring retry policies without Router? We're happy to contribute the change if it's fairly straightforward :D (though any bandwidth you have on your side to support it would be massively appreciated).

cc @okhat

krrishdholakia commented 2 weeks ago

@ishaan-jaff do we still need openai/azure client init on router?

iirc you implemented some sort of client caching logic on the .completion call already, right?

krrishdholakia commented 2 weeks ago

i wonder how hard it would be to just move the async_function_with_retries outside the router, and use that inside the wrapper_async / wrapper functions

ishaan-jaff commented 2 weeks ago

do we still need openai/azure client init on router?

Nope, we don't it's probably better to have this on completion level

dbczumar commented 1 week ago

Hi @krrishdholakia @ishaan-jaff , thank you so much for the ideation here! What's best regarding next steps for getting this implemented?