Open kgilpin opened 4 hours ago
Handle LLM Token Overflow by Truncating User Message and Retrying
When interacting with the LLM, there are instances where the user message exceeds the token limit. This leads to an overflow, causing system errors or failed invocations. The objective is to manage this token overflow by truncating the user message and retrying the invocation to ensure stability and smooth user experience.
The primary challenge is identifying the message that causes the overflow due to exceeding the token limit. Once identified, the message should be truncated to a safe limit and the invocation retried. The solution involves:
Given the retry logic already implemented in various parts of the system, notably in the retry.ts
utilities, the new functionality for truncating and retrying can follow a similar model.
packages/navie/src/llmInteraction.ts:
packages/client/src/retryOnError.ts:
packages/client/src/retryOn503.ts (if necessary):
packages/navie/src/llmInteraction.ts:
packages/client/src/retryOnError.ts:
retryOnError
) to detect specific LLM token overflow errors and execute the truncation logic followed by retry.packages/client/src/retryOn503.ts:
By integrating the truncation logic within the existing retry handlers and ensuring token overflow errors are gracefully managed, the system's resilience to large inputs will be significantly improved.
When we overflow the LLM token limit, truncate the user message and retry.
For a specific example, see here:
https://github.com/getappmap/navie-benchmark/issues/38
--
In the navie package: