Open artmoskvin opened 9 months ago
To enable response streaming in the AI class, we'll follow these steps:
Understand the Current Implementation:
AI
class currently has a call
method that takes a list of BaseMessage
objects and returns a string. This method internally calls the model
(an instance of BaseChatModel
) with the messages and returns the content.Identify Required Changes:
AI
class that utilizes the stream
method from BaseChatModel
to stream generated tokens as they are returned from the model.Implementation Plan:
stream
in the AI
class.call
method, i.e., a list of BaseMessage
objects.stream
method, use the stream
method of the model
attribute (which is an instance of BaseChatModel
) to yield tokens as they are generated.stream
method in BaseChatModel
returns an iterator of BaseMessageChunk
. We will yield these chunks from our new stream
method in the AI
class.Testing:
stream
method to ensure it correctly streams the response from the model.Documentation:
AI
class to include the new stream
method, explaining its purpose, parameters, and return type.Review and Merge:
This plan aims to enhance the user experience by providing immediate feedback from the AI model as it generates responses. Let's discuss this plan with the team for approval before proceeding with the implementation.
@devbuddytest the plan is approved, go for it!
The implementation for enabling response streaming in the AI class has been completed, and a pull request has been created: Enable Response Streaming in AI Class.
This pull request introduces a new stream
method in the AI
class, enabling the streaming of generated tokens as soon as they are returned from the model. This enhancement aims to improve the user experience by providing immediate feedback from the AI model.
stream
method in the AI
class that utilizes the stream
method from BaseChatModel
to yield tokens as they are generated.Iterator
and BaseMessageChunk
) for the implementation.Please review the pull request and merge it if everything looks good.
Currently, the AI class generates only batch completions. So we have to wait until the whole completion is generates until we can send it back to user. A common way to improve UX is to stream generated tokens as soon as they are returned from the model. Let's add a new stream method in the AI class that calls BaseChatModel.stream(...) -> Iterator[BaseMessageChunk] from langchain under the hood.