[Bug]: Claude3 proxy not working

rdaigle007 commented 6 months ago

What happened?

A bug happened!

Using litellm as proxy, with claude3 as selected LLM.

Config.yaml entry is:

`Russells-MBP-2:litellm russ$ cat claude3-sonnet.yaml model_list:

model_name: anthropic/claude-3-sonnet litellm_params: model: claude-3-sonnet-20240229 api_base: https://api.anthropic.com/v1/messages api_key: os.environ/ANTHROPIC_API_KEY max_tokens: 4000

litellm_settings: set_verbose: True `

First couple claude3 messages work fine. But at some point Claude3 returns back a response with empty content array, and liteLLM doesn't handle this.

The last request/response is in logs below.

Relevant log output

ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
Final returned optional params: {'max_tokens': 4000}
self.optional_params: {'max_tokens': 4000}

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: sk-ant-api03-BA2yORwsAp3QLgJuCmUbUxxM4GpRUbfhd_bXTqj2tBBe-lJq7WqV2h58W55zaEDWf0O499crYmd********************' \
-d '{'model': 'claude-3-sonnet-20240229', 'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'set up an email account and ensure it can receive emails sent to it. Any bot-detection or captcha mechanisms can wait for human input in the browser, and then continue with the setup. Choose usernames/passwords and other data you fill in forms be chosen to increase changes of being accepted by email provider. Please code defensively to detect and recover from issues (for example, what if the username chosen is already in use)'}, {'type': 'text', 'text': "Here's a proposed plan for setting up an email account and ensuring it can receive emails, with steps for the engineer and scientist:\n\nScientist:\n1. Identify a reputable email service provider (e.g., Gmail, Outlook, Yahoo, etc.) that meets the requirements.\n2. Determine the necessary information and credentials needed for account creation (e.g., username, password, recovery email, etc.).\n3. Establish guidelines for choosing usernames, passwords, and other data to increase the chances of being accepted by the email provider.\n4. Consider potential issues that may arise during the setup process and develop strategies for detection and recovery (e.g., username already taken, captcha challenges, etc.).\n\nEngineer:\n1. Implement a script or program to automate the email account creation process.\n2. Include functionality to handle bot-detection or captcha mechanisms, pausing the process and prompting for human input in the browser when necessary.\n3. Incorporate defensive coding practices to detect and recover from issues, such as:\n   a. Checking if the chosen username is already taken and generating a new one if necessary.\n   b. Implementing retry mechanisms for failed account creation attempts.\n   c. Handling and logging errors or exceptions that may occur during the process.\n4. Implement functionality to send a test email to the newly created account and verify its successful receipt.\n5. Provide a user-friendly interface or command-line options for the scientist to input the necessary information and preferences.\n\nScientist:\n6. Review and approve the plan, providing feedback or additional requirements if needed.\n7. Test the implemented solution, including edge cases and potential issues identified earlier.\n8. Iterate on the plan and implementation until the email account setup and verification process is successful and meets all requirements.\n\nBy following this plan, the engineer can develop a robust and defensive solution for setting up an email account and verifying its functionality, while the scientist provides domain knowledge, guidelines, and testing to ensure the solution meets the desired requirements."}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'The proposed plan looks good overall. Here are some additional suggestions and considerations:\n\nScientist:\n\n1. When identifying the email provider, also consider factors like privacy, security, and any specific requirements or restrictions for the use case.\n2. Provide guidelines or rules for generating strong and unique passwords, and consider using a password manager or secure storage for the credentials.\n3. Determine if any specific email features (e.g., folder organization, filters, alias support) are required and factor them into the provider selection process.\n\nEngineer:\n\n1. Implement functionality to check the availability of the chosen username before attempting account creation, and have a strategy for generating alternative usernames if needed (e.g., appending numbers or special characters).\n2. Implement functionality to handle different types of captchas or bot-detection mechanisms (e.g., image recognition, audio challenges, reCAPTCHA) by pausing and prompting for human input as needed.\n3. Consider using headless browsers or automation tools like Selenium for browser-based interactions during account setup and email verification.\n4. Implement proper error handling and logging mechanisms to capture and report any issues encountered during the process.\n5. Add support for configuring the solution through command-line arguments, configuration files, or environment variables for easy customization and automation.\n6. Include unit tests and integration tests to ensure the robustness and reliability of the solution.\n7. Consider adding verifiable information, such as source URLs or references, to the plan and code comments for transparency and accountability.\n\nBoth:\n\n1. Discuss and agree on the criteria for successful email account setup and verification (e.g., receiving a test email within a specified time frame, checking for common setup issues).\n2. Establish a communication channel for the scientist to provide feedback or report any issues encountered during testing.\n3. Plan for future maintenance and updates, as email providers may change their interfaces, policies, or security mechanisms over time.\n\nBy incorporating these additional considerations, the plan and implementation can be more robust, secure, and maintainable while meeting the specific requirements of the use case.'}]}], 'max_tokens': 4000, 'system': 'Critic. Double check plan, claims, code from other agents and provide feedback. \n    Check whether the plan includes adding verifiable info such as source URL. \n    '}'

_is_function_call: False
RAW RESPONSE:
<coroutine object AnthropicChatCompletion.acompletion_function at 0x110746260>

RAW RESPONSE:
{"id":"msg_01NDGKY7SWjXeJnHUvKdvXNy","type":"message","role":"assistant","model":"claude-3-sonnet-20240229","stop_sequence":null,"usage":{"input_tokens":1009,"output_tokens":3},"content":[],"stop_reason":"end_turn"}

raw model_response: {"id":"msg_01NDGKY7SWjXeJnHUvKdvXNy","type":"message","role":"assistant","model":"claude-3-sonnet-20240229","stop_sequence":null,"usage":{"input_tokens":1009,"output_tokens":3},"content":[],"stop_reason":"end_turn"}
status_code: 200
Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
get cache: cache key: 21-39:cooldown_models; local_only: False
get cache: cache result: None
set cache: key: 21-39:cooldown_models; value: ['f00a9d21612cc0c52b58395bee6af46438ba462a3f759b87e322c847268d4632']
InMemoryCache: set_cache
Inside Max Parallel Request Failure Hook
user_api_key: NotRequired
get cache: cache key: NotRequired::2024-05-04-14-39::request_count; local_only: False
get cache: cache result: {'current_requests': 0, 'current_tpm': 5380, 'current_rpm': 6}
updated_value in failure call: {'current_requests': 0, 'current_tpm': 5380, 'current_rpm': 6}
set cache: key: NotRequired::2024-05-04-14-39::request_count; value: {'current_requests': 0, 'current_tpm': 5380, 'current_rpm': 6}
InMemoryCache: set_cache
async get cache: cache key: 21-39:cooldown_models; local_only: False
in_memory_result: ['f00a9d21612cc0c52b58395bee6af46438ba462a3f759b87e322c847268d4632']
get cache: cache result: ['f00a9d21612cc0c52b58395bee6af46438ba462a3f759b87e322c847268d4632']
async get cache: cache key: 21-39:cooldown_models; local_only: False
in_memory_result: ['f00a9d21612cc0c52b58395bee6af46438ba462a3f759b87e322c847268d4632']
get cache: cache result: ['f00a9d21612cc0c52b58395bee6af46438ba462a3f759b87e322c847268d4632']
Traceback (most recent call last):
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/main.py", line 319, in acompletion
    response = await init_response
               ^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/llms/anthropic.py", line 306, in acompletion_function
    return self.process_response(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/llms/anthropic.py", line 143, in process_response
    raise AnthropicError(
litellm.llms.anthropic.AnthropicError: No content in response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3818, in chat_completion
    responses = await asyncio.gather(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 455, in acompletion
    raise e
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 451, in acompletion
    response = await self.async_function_with_fallbacks(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 1425, in async_function_with_fallbacks
    raise original_exception
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 1343, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 1527, in async_function_with_retries
    raise original_exception
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 1444, in async_function_with_retries
    response = await original_function(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 572, in _acompletion
    raise e
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/router.py", line 556, in _acompletion
    response = await _response
               ^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/utils.py", line 3609, in wrapper_async
    raise e
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/utils.py", line 3441, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/main.py", line 340, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/utils.py", line 8965, in exception_type
    raise e
  File "/Volumes/ExtremePro/litellm/env-litellm/lib/python3.11/site-packages/litellm/utils.py", line 7896, in exception_type
    raise APIError(
litellm.exceptions.APIError: AnthropicException - No content in response. Handle with `litellm.APIError`.
INFO:     127.0.0.1:64036 - "POST /chat/completions HTTP/1.1" 200 OK

Twitter / LinkedIn details

No response

ishaan-jaff commented 6 months ago

@rdaigle007 it looks like because Anthropic API returned an emtpy response. What would you like LiteLLM to do here?

(iirc The OpenAI python SDK raises an exception in this scenario too)

rdaigle007 commented 6 months ago

I'd assume propagate the empty response back to the client of the proxy.

FOr example, I'm using AutoGen multi-agent application. Such python applications are quite high-level (ie, not calling OpenAI directly), in that you define a number of agents... then you kick them off to solve a problem. SUch an exception from LiteLLM is causing the entire agent chain to break and exit. It's like like I can just catch exceptions at a specific openAI message-send/response and retry.

There will be more and more of these types of high level apps that will break with the way that LiteLLM is handling this.

And CLaude 3 is returning this empty message response quite a bit (and with an http 200 response). Hence would be super useful for LiteLLM to handle this common occurrence from Claude 3 in order to claim compatibility.

I realize this is an edge case, and is a weird behaviour from Claude. However, I've been a protocol developer for decades, and an important tenet of protocol development is to be very forgiving of the response you receive in order to make applications more reliable.

Food for thought. Feel free to close this if you disagree. I will unfortunately need to start looking at other proxy type agents as Claude is one of numerous I need to support.

rdaigle007 commented 6 months ago

Regardless, your quick response time was pretty cool, respected, and appreciated.

krrishdholakia commented 6 months ago

removed the if-check on our end based on your feedback.

Agreed - litellm shouldn't be causing this issue for your agents

Fix will be live in release later today

BerriAI / litellm