Chainlit / chainlit

Build Conversational AI in minutes ⚡️
https://docs.chainlit.io
Apache License 2.0
6.91k stars 912 forks source link

Double output of response with ChainlitCallbackHandler #108

Closed SimonB97 closed 1 year ago

SimonB97 commented 1 year ago

Hi, your package is awesome, thank you for this!

Unfortunately, i'm having an issue where the resonses are displayed two times each when using the ChainlitCallbackHandler:

image

i have to use the callback handler to be able to get streaming output with an initially sync agent from langchain wich i made async by applying the fix (last function below) mentioned in your docs:

@cl.langchain_factory(use_async=True)
def factory():
    # Initialize the OpenAI language model
    llm = ChatOpenAI(
        temperature=0, 
        model="gpt-3.5-turbo-0613", 
        streaming=True, 
        callbacks=[cl.ChainlitCallbackHandler()]
    )

    # Initialize the SerpAPIWrapper for search functionality
    search = SerpAPIWrapper()

    # Define a list of tools offered by the agent
    tools = [
        Tool(
            name="Search",
            func=search.run,
            description="Useful when you need to answer questions about current events or if you have to search the web. You should ask targeted questions like for google."
        ),
        WriteFileTool(),
        ReadFileTool()
    ]

    # needed for memory with openai functions agent
    agent_kwargs = {
        "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
    }

    memory = ConversationTokenBufferMemory(
        memory_key="memory", 
        return_messages=True,
        max_token_limit=2000,
        llm=llm
    )

    mrkl = initialize_agent(
        tools=tools,
        llm=llm, 
        agent=AgentType.OPENAI_MULTI_FUNCTIONS, 
        verbose=True, 
        agent_kwargs=agent_kwargs, 
        memory=memory,
    )

    return mrkl

@cl.langchain_run
async def run(agent, input):
    # Since the agent is sync, we need to make it async
    res = await cl.make_async(agent.run)(input)
    await cl.Message(content=res).send()

How can i stop the double output?

willydouhard commented 1 year ago

What happens is that you see the LLM response (first message), and the langchain agent final answer(second message). Usually it is supposed to be indented but since your are passing the callback handler to only the LLM, there is nothing to indent the message in.

remove this line callbacks=[cl.ChainlitCallbackHandler()] in the LLM definition and change langchain_run like this:

@cl.langchain_run
async def run(agent, input):
    # Since the agent is sync, we need to make it async
    res = await cl.make_async(agent.run)(input, callbacks=[cl.ChainlitCallbackHandler()])
    await cl.Message(content=res).send()
SimonB97 commented 1 year ago

when i do this, the response isn't streamed for me.. but it removed the double output of the response.

@willydouhard sorry not sure if you see notifications if i answer normally

willydouhard commented 1 year ago

No worries. This is the correct behavior. By default langchain does not stream the final answer but only the intermediary steps. It is a known issue and we will try to address it in the next releases.

SimonB97 commented 1 year ago

Ahh, i see. Thank you!