BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.42k stars 1.57k forks source link

[Bug]: litellm.completion retries fail silently without tenacity (not router/proxy) #5690

Open F1bos opened 1 month ago

F1bos commented 1 month ago

What happened?

When litellm encounters an error that triggers retries (e.g., model returns an invalid response, network issue), the retry mechanism fails silently if the tenacity library is not installed. The error message only indicates the initial error, without mentioning the missing dependency or the failed retry attempts.

To Reproduce

  1. Run any litellm code that includes retries (e.g., num_retries > 0) without tenacity installed.
  2. Induce an error that would normally trigger a retry (e.g., by providing an invalid prompt or simulating a network issue).
  3. The code fails with the initial error but doesn't mention the missing tenacity dependency or the failed retries.

Expected behavior

The error message should explicitly state that retries failed due to the missing tenacity library, allowing users to easily identify and resolve the issue. Ideally, it should also log the failed retry attempts.

Code Snippet for Validation

import asyncio
import json
import os

import litellm

litellm.enable_json_schema_validation = True
litellm.set_verbose = True # see the raw request made by litellm

os.environ['GEMINI_API_KEY'] = ""

response_schema = {
    "type": "array",
    "items": {
        "type": "string",
    },
}

response_format={
    "type": "json_object",
    "response_schema": response_schema,
    "enforce_validation": True
}

safety_settings=[
    {
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_HATE_SPEECH",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
        "threshold": "BLOCK_NONE",
    },
]

async def main():
    messages = [
        {
            "role": "user",
            "content": "send me an invalid json"
        }
    ]

    response = await litellm.acompletion(
        model="gemini/gemini-1.5-flash",
        response_format=response_format,
        safety_settings=safety_settings,
        messages=messages,
        num_retries=1,
     )

    message = response.choices[0].message
    print('Response: ', response, ' | ', json.loads(message.content))

asyncio.run(main())

Additional context

The relevant code responsible for retries is in the wrapper_async method: https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/utils.py#L1582

However, the dependency on tenacity is only checked within acompletion_with_retries:

https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/main.py#L2838-L2848

This leads to a silent failure of retries if tenacity is not installed.

Possible Solution

The error handling within wrapper_async should be improved to catch the missing tenacity dependency and provide a more informative error message, including details about the failed retry attempts.

Workaround

Install the tenacity library using pip install tenacity.

Relevant log output

Request to litellm:
litellm.acompletion(model='gemini/gemini-1.5-flash', response_format={'type': 'json_object', 'response_schema': {'type': 'array', 'items': {'type': 'string'}}, 'enforce_validation': True}, safety_settings=[{'category': 'HARM_CATEGORY_HARASSMENT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'BLOCK_NONE'}], messages=[{'role': 'user', 'content': 'send me an invalid json'}], num_retries=1)

23:56:53 - LiteLLM:WARNING: utils.py:361 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
Final returned optional params: {'response_mime_type': 'application/json', 'response_schema': {'type': 'array', 'items': {'type': 'string'}}, 'safety_settings': [{'category': 'HARM_CATEGORY_HARASSMENT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'BLOCK_NONE'}]}

POST Request Sent from LiteLLM:
curl -X POST \
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=[REDACTED] \
-H 'Content-Type: *****' \
-d '{'contents': [{'role': 'user', 'parts': [{'text': 'send me an invalid json'}, {'text': "Use this JSON schema: \n     \n    {'type': 'array', 'items': {'type': 'string'}}\n    "}]}], 'safetySettings': [{'category': 'HARM_CATEGORY_HARASSMENT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'threshold': 'BLOCK_NONE'}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'threshold': 'BLOCK_NONE'}], 'generationConfig': {'response_mime_type': 'application/json'}}'

RAW RESPONSE:
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "[\"a\", \"b\", 1, \"d\"]\n"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 37,
    "candidatesTokenCount": 12,
    "totalTokenCount": 49
  }
}

raw model_response: {
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "[\"a\", \"b\", 1, \"d\"]\n"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 37,
    "candidatesTokenCount": 12,
    "totalTokenCount": 49
  }
}

Looking up model=gemini/gemini-1.5-flash in model_cost_map
Traceback (most recent call last):
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/json_validation_rule.py", line 24, in validate_schema
    validate(response_dict, schema=schema)
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/jsonschema/validators.py", line 1332, in validate
    raise error
jsonschema.exceptions.ValidationError: 1 is not of type 'string'

Failed validating 'type' in schema['items']:
    {'type': 'string'}

On instance[2]:
    1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/projects/test/issue_showcase.py", line 63, in <module>
    asyncio.run(main())
  File "/home/user/miniconda3/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/user/projects/test/issue_showcase.py", line 52, in main
    response = await litellm.acompletion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1595, in wrapper_async
    raise e
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1456, in wrapper_async
    post_call_processing(
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/utils.py", line 735, in post_call_processing
    raise e
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/utils.py", line 727, in post_call_processing
    litellm.litellm_core_utils.json_validation_rule.validate_schema(
  File "/home/user/projects/test/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/json_validation_rule.py", line 26, in validate_schema
    raise JSONSchemaValidationError(
litellm.exceptions.JSONSchemaValidationError: litellm.APIError: litellm.JSONSchemaValidationError: model=, returned an invalid response=["a", "b", 1, "d"]
, for schema={"type": "array", "items": {"type": "string"}}.
Access raw response with `e.raw_response

Twitter / LinkedIn details

No response

krrishdholakia commented 3 weeks ago

This leads to a silent failure of retries if tenacity is not installed.

Hmm, how does a silent failure happen here - it looks like an exception is being raised if the import fails