Chainlit / chainlit

Build Conversational AI in minutes ⚡️
https://docs.chainlit.io
Apache License 2.0
6.76k stars 878 forks source link

Langchain callback handler misses "Final Answer:" marker. #1286

Open foragerr opened 1 week ago

foragerr commented 1 week ago

Describe the bug I'm using Claude Sonnet 3.5. The streamed tokens returned by the LLM do not align with word boundaries. This code snippet in Chainlit intermittently fails to detect the phrase "Final Answer:" and then fails to post the final answer back to the user.

Here are some cases where I noticed this type of failure - For DEFAULT_ANSWER_PREFIX_TOKENS = ["Final", "Answer", ":"],

['Final', 'Answer', ':'] != [': Do I nee', 'd to use a tool', '? No\n\nFinal']
['Final', 'Answer', ':'] != ['d to use a tool', '? No\n\nFinal', 'Answer: Hello']

For this case:

['Final', 'Answer', ':'] != ['need to use a', 'tool? No', 'Final Answer:']

The first check still fails, but the 2nd check does catch the Final Answer.

To Reproduce The issue is intermittent and repro depends on a lot of factors from LLM tokenization to how streams are handling chunking. While I have an implementation that can reliably reproduce this issue, it is hard to isolate a minimum verifiable test case.

Expected behavior callback handler detects "Final Answer:" a bit more reliably.

foragerr commented 1 week ago

I've tried to workaround this issue by condensing the List[str] into minified strings and comparing them.

condensed_prefix_tokens_stripped = "".join(self.answer_prefix_tokens_stripped).lower()
condensed_last_tokens = "".join(last_tokens).replace(" ", "").lower()

if condensed_prefix_tokens_stripped in condensed_last_tokens:
    return True

Which helps detection in my case. But my final posted answer ends up losing some characters. in the case of ['Final', 'Answer', ':'] != ['d to use a tool', '? No\n\nFinal', 'Answer: Hello'] since "Hello" is contained in the last_tokensobject, it goes missing fromfinal_stream`

foragerr commented 1 week ago

This issue may be related. https://github.com/langchain-ai/langchain/issues/10316 The folks in that issue thread are working around this by implementing a barebones custom callback handler.

foragerr commented 1 week ago

I'm looking for some feedback on the approach I'm taking, I can take a shot at creating a PR.

dokterbob commented 1 week ago

I think this is a bug we've been hitting as well. We simply ignored the whole 'final answer' detection. ;)

One thing I would urge you to refrain from is converting the entire answer to List[str], because that would happen on every token and hence be way too expensive.

Rather, perhaps what you could do is see whether a token partially matches the search string (final answer). If and only if it does, check whether subsequent tokens match the continuation. Finally, if all the tokens have bene matched, mark everything after that as the final answer.

Looking forward to your renewed and perhaps a bit more contextualised proposal (e.g. where in our code base would you suggest to make this change).

Great to see this moving forward!

dokterbob commented 1 week ago

Do look at other issues regarding to final answers, you might be able to hit several birds with the same stone (and prevent new issues from occurring).