Streaming nested LLMChain instantly instead of waiting for final answer

Hey folks,

Im currently implementing with langchain and chainlit. Its an agent with tools set return_direct=True.

The streaming feature is activated and seems to work:

My problem is now that I dont want the agent to stream the decision of which tool to choose but the generating output of the tool itself which is inside nested LLM chains:

Is it possible to stream the output of the tools directly instead of waiting for it passing back to agent? the "return_direct" attribute is anyways set to true.

My current implementation of on_message looks like this:

@cl.on_message
async def main(message: str):
    global agent
    global chain
    cb = cl.AsyncLangchainCallbackHandler(
        stream_final_answer=True,
        stream_prefix=False,
    )
    cb.answer_reached = True
    res = await agent.arun(message, callbacks=[cb])
    #res = await chain.arun(message, callbacks=[cb])
    await cl.Message(content=res).send()

calling the chain directly delivers me the results how I would need it for the tool/chain but not for the agent as shown in the screenshots. Any thoughts or ideas? 🙂

Chainlit / chainlit

Streaming nested LLMChain instantly instead of waiting for final answer #377