Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 8202 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
By default, we set the experiment's max token limit (under edit experiment -> safety) to 8192, which in the case of the above error is also the model's maximum context length. Since we only consider the chat history + summary when compressing, this results in the corner case when the history + summary comes close to the model's token length since it is just a component of the final input to the LLM. The other components of the input (system + user input + all other data we might give) sometimes drives the total context length to over the model's maximum again.
Two solutions
a non-robust, but intrim one: We set the experiment's token limit to be << model's maximum to allow the rest of input components some space to wiggle.
The more robust one: We consider all inputs to the LLM when checking the token count (but we only ever compress the history, which we currently do)
There's a few of these errors
By default, we set the experiment's max token limit (under edit experiment -> safety) to 8192, which in the case of the above error is also the model's maximum context length. Since we only consider the chat history + summary when compressing, this results in the corner case when the history + summary comes close to the model's token length since it is just a component of the final input to the LLM. The other components of the input (system + user input + all other data we might give) sometimes drives the total context length to over the model's maximum again.
Two solutions