dimagi / open-chat-studio

A web based platform for building Chatbots backed by Large Language Models
BSD 3-Clause "New" or "Revised" License
14 stars 7 forks source link

Expand token based compression to consider the whole input to the LLM #463

Closed SmittieC closed 2 months ago

SmittieC commented 3 months ago

There's a few of these errors

Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 8202 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

By default, we set the experiment's max token limit (under edit experiment -> safety) to 8192, which in the case of the above error is also the model's maximum context length. Since we only consider the chat history + summary when compressing, this results in the corner case when the history + summary comes close to the model's token length since it is just a component of the final input to the LLM. The other components of the input (system + user input + all other data we might give) sometimes drives the total context length to over the model's maximum again.

Two solutions

  1. a non-robust, but intrim one: We set the experiment's token limit to be << model's maximum to allow the rest of input components some space to wiggle.
  2. The more robust one: We consider all inputs to the LLM when checking the token count (but we only ever compress the history, which we currently do)