ajndkr / lanarky

The web framework for building LLM microservices
https://lanarky.ajndkr.com/
MIT License
976 stars 74 forks source link

bug: TokenStreamingCallbackHandler accepts only one key #188

Closed amitjoy closed 7 months ago

amitjoy commented 8 months ago

Scenario

Currently, I have configured a RAG chain to use GCP Matching Engine (Vector DB) and Vertex AI Chat Bison model. The chain generates several events having different keys always.

Actual result

Warning and exception as the configured output key is not available in all events

Expected result

We should be able to configure a list of keys which can be checked for availability in the events. If none of the keys exists, we can raise KeyError.

Acceptance criteria

No response

Contribute

amitjoy commented 8 months ago

In my scenario, I can see events with following keys in order:

  1. text (containing the source question rephrased properly by the chain)
  2. text (containing the answer)
  3. output_text (containing the answer once again) - NO CLUE why
  4. answer (containing he answer again since it has been configured as the output_key in ConversationSummaryBufferMemory)

if I keep the output_key configured in ConversationSummaryBufferMemory, Lanarky generated the following warning twice due to no. 3 and no. 4 (from the above list) since I have set the output_key to text in TokenStreamingCallbackHandler:

Error in TokenStreamingCallbackHandler.on_chain_end callback: KeyError('missing outputs key: text')

And if I remove the output_key from ConversationSummaryBufferMemory, it results in the following ERROR:

lanarky.adapters.langchain.responses:stream_response:89 - chain runtime error: One output key expected, got dict_keys(['answer', 'source_documents', 'generated_question'])

I think, it would be better off to have output_key configured in ConversationSummaryBufferMemory but remove throwing KeyError from lanarky.adapters.langchain.callbacks.TokenStreamingCallbackHandler.on_chain_end

ajndkr commented 7 months ago

some context, the output_key is used to stream full chain answer when streaming is enabled but the llm response is cached: https://github.com/ajndkr/lanarky/blob/64215d95f474e05484888be616b5c071fa31f298/lanarky/adapters/langchain/callbacks.py#L149-L164

As I'd like to keep this base functionality for other use cases, I recommend creating a new callback handler which inherits from TokenStreamingCallbackHandler. Here you can override on_chain_end as you see fit.

amitjoy commented 7 months ago

@ajndkr Thanks for your quick response. I have actually done the exact same thing to override the functionality.

ajndkr commented 7 months ago

cool! if the override approach works, shall we close this issue?

amitjoy commented 7 months ago

Yeah, makes sense to close it then. However, I have a different issue after introducing the custom callback handler which I have already commented in https://github.com/ajndkr/lanarky/issues/186#issuecomment-2007073027