BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.2k stars 1.54k forks source link

[Bug]: Streaming broken in new support for OpenAI assistants? #3669

Closed ben-bl closed 4 months ago

ben-bl commented 5 months ago

What happened?

With #2842 assistant support was just added by @krrishdholakia 🙏

I just can't get streaming to work with the current integration, and I'm not sure if it's because of the missing documentation, my wrongdoing or because of a bug.

The following is the code I tried to get the streaming feedback with. This code is built upon the example of the official documentation of custom callbacks, extended by the example code of the Pull Request to call an OpenAI assistant (https://github.com/BerriAI/litellm/pull/3455#issue-2279255048).

I do get the final, complete and unstreamed answer from the llm, but no handler calls are logged.

from litellm import get_assistants, create_thread, add_message, run_thread, get_messages, MessageData
import os, litellm

from litellm.integrations.custom_logger import CustomLogger

class MyCustomHandler(CustomLogger):
    def log_pre_api_call(self, model, messages, kwargs):
        print(f"Pre-API Call")

    def log_post_api_call(self, kwargs, response_obj, start_time, end_time):
        print(f"Post-API Call")

    def log_stream_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Stream")

    def log_success_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Success")

    def log_failure_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Failure")

    #### ASYNC #### - for acompletion/aembeddings

    async def async_log_stream_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Async Streaming")

    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Async Success")

    async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
        print(f"On Async Success")

customHandler = MyCustomHandler()

litellm.callbacks = [customHandler]

assistants = get_assistants(custom_llm_provider="openai")

## get the first assistant ###
assistant_id = assistants.data[0].id

new_thread =  create_thread(
        custom_llm_provider="openai",
)

thread_id = new_thread.id

# add message to thread
message: MessageData = {"role": "user", "content": "Who are you?"}  # type: ignore

added_message = add_message(
    thread_id=new_thread.id, custom_llm_provider="openai", **message
)

run = run_thread(
    custom_llm_provider="openai", thread_id=thread_id, assistant_id=assistant_id, stream=True
)

# print the final, complete response
print(get_messages(custom_llm_provider="openai", thread_id=thread_id, assistant_id=assistant_id))

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 4 months ago

Hey @ben-bl thanks for this issue. planning on adding it this week.

How're you using litellm today?