This PR adds functionality to allow users to set a lower context window than their LLMs maximum context window size. This is useful when the models performance degrades significantly with larger context windows, allowing users to optimize the tradeoff between context length and model performance.
Background
With PR #4977 being merged in version 0.14, we now support dynamic context window sizes. This PR builds on that by allowing users to manually set a lower context window size than their LLMs maximum, which can be beneficial in cases where:
The model shows performance degradation with very large contexts
Users want to optimize for speed vs context length
Memory or computational resources need to be conserved
Changes
Add token count checking before adding new events to prevent exceeding the window
Implement truncation logic when token count would exceed the configured limit
Improve handling of first user message preservation to maintain conversation coherence
Add comprehensive test case for context window parameter truncation
Configuration
Users can set max_input_tokens in two ways:
Through config.toml:
[llm]
max_input_tokens = 20000
Through environment variables:
export LLM_MAX_INPUT_TOKENS=20000
Implementation Details
Token count is checked before adding each new event
If adding an event would exceed the context window:
The history is truncated using _apply_conversation_window
Action-observation pairs are kept together
The first user message is always preserved
The truncation is done in a way that maintains conversation coherence
Testing
Added new test case test_context_window_parameter_truncation that verifies:
Token count checking works correctly
Truncation occurs when limit would be exceeded
First user message is preserved
Action-observation pairs stay together
This implements and enhances the changes from PR #5079.
This PR adds functionality to allow users to set a lower context window than their LLMs maximum context window size. This is useful when the models performance degrades significantly with larger context windows, allowing users to optimize the tradeoff between context length and model performance.
Background
With PR #4977 being merged in version 0.14, we now support dynamic context window sizes. This PR builds on that by allowing users to manually set a lower context window size than their LLMs maximum, which can be beneficial in cases where:
Changes
Configuration
Users can set
max_input_tokens
in two ways:Through config.toml:
Through environment variables:
Implementation Details
_apply_conversation_window
Testing
Added new test case
test_context_window_parameter_truncation
that verifies:This implements and enhances the changes from PR #5079.