Stream Final Agent Response

Description

We want to stream the tokens from the final output of the agent.

This allows the user to start reading the response as soon as it starts to get generated, instead of waiting until the whole answer is done.

This should greatly improve the time until a response is visible to the user, especially for longer answers. It would also make the bot feel more “alive”, since it would feel more like someone who's typing in real-time.

Implementation

When I use on_llm_new_token this does not work, because all sort of output is then streamed.
FinalStreamingStdOutCallbackHandler does not work because it was written for the ZERO_SHOT_REACT_DESCRIPTION agent.

Todo's

[x] Wait for possible answer on LangChain discord support forum
[x] Wait for implementation of harisrab
- Implementation works, but the final answer is only streamed after the final answer was generated fully by the LLM. This looks nice, but does not improve the time on how long the user has to wait for an answer.
[x] Waiting again for response of harisrab
[x] Implement backend
[x] Implement frontend
[x] Final testing
[x] Documentation of new stream over web socket in README

Resources

Base class callback handler
Gist shared by harisrab for implementation with React Agent

0ptim / JellyChat

Stream Final Agent Response #45

Description

Implementation

Todo's

Resources