Truncate messages to avoid overflow

Title

Handle LLM Token Overflow by Truncating User Message and Retrying

Problem

When interacting with the LLM, there are instances where the user message exceeds the token limit. This leads to an overflow, causing system errors or failed invocations. The objective is to manage this token overflow by truncating the user message and retrying the invocation to ensure stability and smooth user experience.

Analysis

The primary challenge is identifying the message that causes the overflow due to exceeding the token limit. Once identified, the message should be truncated to a safe limit and the invocation retried. The solution involves:

Detecting an overflow situation when the LLM interaction fails.
Identifying the longest message (in terms of tokens or characters) from the user's input.
Truncate the identified message to a predefined safe limit, ensuring that the total token count stays within the allowed threshold.
Retrying the invocation with the truncated message.

Given the retry logic already implemented in various parts of the system, notably in the retry.ts utilities, the new functionality for truncating and retrying can follow a similar model.

Proposed Changes

packages/navie/src/llmInteraction.ts:
- Implement a method to calculate the token count of a message.
- Add logic to identify the longest user message (in terms of token count).
- Implement functionality to truncate the longest message to a safe limit.
- Introduce an error handling mechanism to catch the token overflow exception, truncate the message and retry the invocation.
packages/client/src/retryOnError.ts:
- Include additional error codes or checks to detect LLM token overflow errors.
- Update the retry mechanism to call the new truncation logic before retrying.
packages/client/src/retryOn503.ts (if necessary):
- Add similar checks and retry mechanisms to handle LLM token overflow scenarios.

Specific Code Changes

packages/navie/src/llmInteraction.ts:
- Add a helper function to calculate the token count of a given message.
- Extend the existing function to locate the longest user message.
- Extend existing functionality or add new handlers to truncate the message and retry.
packages/client/src/retryOnError.ts:
- Update current error handling (retryOnError) to detect specific LLM token overflow errors and execute the truncation logic followed by retry.
packages/client/src/retryOn503.ts:
- If applicable, ensure that 503 errors related to LLM are also managed by retrying post-truncation.

By integrating the truncation logic within the existing retry handlers and ensuring token overflow errors are gracefully managed, the system's resilience to large inputs will be significantly improved.

getappmap / appmap-js