BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.7k stars 1.61k forks source link

[Bug]: `acomplete` does not return AsyncIterable for streamed tool_call #2644

Closed jackmpcollins closed 7 months ago

jackmpcollins commented 7 months ago

What happened?

I modified the Anthropic Claude function calling example from the README to use acomplete and stream=True and the returned object is not AsyncIterable. This works correctly for openai models.

from litellm import acompletion

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = await acompletion(
    model="openai/gpt-3.5-turbo",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    stream=True,
)
async for chunk in response:  # This works as expected
    print(chunk)

response = await acompletion(
    model="anthropic/claude-3-opus-20240229",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    stream=True,
)
# This raises error
# TypeError: 'async for' requires an object with __aiter__ method, got generator
async for chunk in response:
    print(chunk)

# This works! But it probably shouldn't because it is a blocking in an async context (?)
for chunk in response:
    print(chunk)

Streaming a string output from acomplete works correctly

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, who are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(
        model="anthropic/claude-3-haiku-20240307",
        messages=messages,
        stream=True,
    )
    return response

response = await test_get_response()

async for chunk in response:
    print(chunk)

Related issue for support streamed tool_calls with Anthropic models https://github.com/BerriAI/litellm/issues/2435 This blocks magentic issue https://github.com/jackmpcollins/magentic/issues/153

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 7 months ago

picking this up now @jackmpcollins

krrishdholakia commented 7 months ago

able to repro - seems like we were missing a test here. working on a fix

krrishdholakia commented 7 months ago

should be live soon in v1.33.5.

krrishdholakia commented 7 months ago

feel free to reopen this if it persists @jackmpcollins