Agenta-AI / agenta

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
http://www.agenta.ai
MIT License
1.06k stars 167 forks source link

[AGE-277] Enable streaming mode for the SDK #318

Open mmabrouk opened 11 months ago

mmabrouk commented 11 months ago

Is your feature request related to a problem? Please describe. Using llms without streaming is slow.

Describe the solution you'd like Add a feature to stream outputs to the SDK

Tasks

AGE-277

mmabrouk commented 2 months ago

Here is a simple example of an implementation of streaming in FastAPI and react: https://medium.com/@hxu296/serving-openai-stream-with-fastapi-and-consuming-with-react-js-part-1-8d482eb89702

I think the challenge would be the design of the features. Some early thoughts here:

We can add a parameter to @entrypoint(streaming=True) and expect the user to yield the message instead of returning it, then encapsulate the response in a StreamingResponse with FastAPI. Some questions I would have:

  1. How would the frontend/evaluation know whether the application is streaming or not? Do we name the endpoint differently? Do we save this as part of the application config (but then where would this be defined? when creating the app? when entrypoint is called?) Do we provide an endpoint to get the config?
  2. How would our @span decorator handle streaming functions? Or langchain ones that streams?
mmabrouk commented 2 months ago

@aybruhm says: The frontend would know by the media_type of the endpoint which would be either: text/event-stream or, application/octet-stream Litellm has solved that for us. See here. For users that will be using our span decorators, we can simple investigate how litellm gets the final streaming chunk and do the same.