Closed prasad4fun closed 1 year ago
Hello, it means the agent / tools you are using do not have an async implementation. You can fall back to the sync implem by just changing use_async=False
. You will still be able to stream!
Hello, it means the agent / tools you are using do not have an async implementation. You can fall back to the sync implem by just changing
use_async=False
. You will still be able to stream!
I followed your suggestion and set use_async=False. Now, I can see tokens being printed one at a time in the terminal when using the verbose=True option. However, in chainlit, the tokens are not being streamed continuously. Instead, the UI displays the 'RetrievalQA loader' until the entire answer generation is completed, and then it renders the final answer all at once.
In langchain only the intermediary steps are streamed (if you unfold RetrievalQA loader you should see the text being streamed). We are currently looking on ways to stream the final answer properly.
In langchain only the intermediary steps are streamed (if you unfold RetrievalQA loader you should see the text being streamed). We are currently looking on ways to stream the final answer properly.
Even after unfolding RetrievalQA loader, text isn't being streamed. Only the final response is rendered.
it seems this is the same issue as in #84
Issue Description
Problem: When attempting to use the
RetrievalQA
module with a custom finetuned Llama model and enable streaming, the following error occurs:Steps to Reproduce
Enable streaming using the provided code:
Instantiate a
RetrievalQA
object using a custom Llama model with thefrom_chain_type
method, specifying the necessary parameters.Attempt to stream in
chainlit
using the following code:Expected Behavior
I expected the streaming functionality to work with the custom Language models and not encounter the
NotImplementedError
.Additional Information
langchain
in the command line, and it prints tokens correctly.Please suggest the appropriate way to achieve streaming with custom Language models.