BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.7k stars 1.61k forks source link

[Bug]: Multiple async streaming requests to Anthropic (and possibly other llms) cause an exception #3881

Closed vonstring closed 5 months ago

vonstring commented 5 months ago

What happened?

When using the AsyncOpenAI client with an Anthropic model on litellm, any new completion request will kill previous requests, causing an httpx.ReadError exception (see log output)

Minimal test case:

from openai import AsyncOpenAI
import os
import sys
import asyncio

client = AsyncOpenAI(
    api_key=os.environ.get("LITELLM_API_KEY", "foo"), 
    base_url=os.environ.get("LITELLM_BASE_URL", "http://localhost:4000")
)

MODEL = "claude-3-haiku"
async def send_request():
    try:
        response = await client.chat.completions.create(model="claude-3-haiku", messages=[
            {
                "role": "user", 
                "content": "write 10 words"
            },
        ], stream=True)
        async for chunk in response:
            pass
    except Exception as e:
        print(e)
        sys.exit(1)

async def test():
    tasks = [send_request() for _ in range(2)]
    await asyncio.gather(*tasks)
    print("PASS")

if __name__ == "__main__":
    asyncio.run(test())

The issue seems to be with this and this line in anthropic.py, which creates a new handler and overwrites any reference to previous handlers, causing the httpx client to be prematurely closed in the AsyncHTTPHandler, since there are no more references to the handler.

This quickfix diff cause the handler to be reused, thus making the above testcase pass:

diff --git a/litellm/llms/anthropic.py b/litellm/llms/anthropic.py
index 1ca048523..402e6b99b 100644
--- a/litellm/llms/anthropic.py
+++ b/litellm/llms/anthropic.py
@@ -379,7 +379,7 @@ class AnthropicChatCompletion(BaseLLM):
         logger_fn=None,
         headers={},
     ):
-        self.async_handler = AsyncHTTPHandler(
+        self.async_handler = getattr(self, 'async_handler', None) or AsyncHTTPHandler(
             timeout=httpx.Timeout(timeout=600.0, connect=5.0)
         )
         data["stream"] = True
@@ -421,7 +421,7 @@ class AnthropicChatCompletion(BaseLLM):
         logger_fn=None,
         headers={},
     ) -> Union[ModelResponse, CustomStreamWrapper]:
-        self.async_handler = AsyncHTTPHandler(
+        self.async_handler = getattr(self, 'async_handler', None) or AsyncHTTPHandler(
             timeout=httpx.Timeout(timeout=600.0, connect=5.0)
         )
         response = await self.async_handler.post(

A quick search seems to suggest this could be an issue in several other llm classes as well.

Relevant log output

Traceback (most recent call last):
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 254, in __aiter__
    async for part in self._httpcore_stream:
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 367, in __aiter__
    raise exc from None
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 363, in __aiter__
    async for part in self._stream:
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/http11.py", line 349, in __aiter__
    raise exc
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/http11.py", line 341, in __aiter__
    async for chunk in self._connection._receive_response_body(**kwargs):
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/http11.py", line 210, in _receive_response_body
    event = await self._receive_event(timeout=timeout)
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
    data = await self._network_stream.read(
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 32, in read
    with map_exceptions(exc_map):
  File "/opt/homebrew/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ReadError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/krhah771/code/litellm/litellm/proxy/proxy_server.py", line 3489, in async_data_generator
    async for chunk in response:
  File "/Users/krhah771/code/litellm/litellm/utils.py", line 11894, in __anext__
    raise e
  File "/Users/krhah771/code/litellm/litellm/utils.py", line 11778, in __anext__
    async for chunk in self.completion_stream:
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_models.py", line 963, in aiter_lines
    async for text in self.aiter_text():
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_models.py", line 950, in aiter_text
    async for byte_content in self.aiter_bytes():
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_models.py", line 929, in aiter_bytes
    async for raw_bytes in self.aiter_raw():
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_models.py", line 987, in aiter_raw
    async for raw_stream_bytes in self.stream:
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_client.py", line 149, in __aiter__
    async for chunk in self._stream:
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 253, in __aiter__
    with map_httpcore_exceptions():
  File "/opt/homebrew/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/Users/krhah771/code/litellm/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.ReadError

Twitter / LinkedIn details

https://x.com/vonstrenginho

krrishdholakia commented 5 months ago

Able to repro - picking this up now. This is pretty serious. Thanks for flagging this @vonstring

krrishdholakia commented 5 months ago

hmm still seeing this even after removing the self. reference

krrishdholakia commented 5 months ago

@vonstring i have a fix out - targeting anthropic for now, once you can confirm it works for you too - i'll roll it out to other llm's

vonstring commented 5 months ago

Yes, it works now! Thanks for addressing this so quickly.

krrishdholakia commented 5 months ago

Great - i'll roll this out to the other providers too. thank you for flagging this @vonstring

Curious - how're you using litellm today?