getappmap / appmap-js

Client libraries for AppMap
48 stars 17 forks source link

Token overflow recovery doesn't work in Sonnet mode of GitHub Copilot #2119

Open kgilpin opened 4 days ago

kgilpin commented 4 days ago
1344 [Stderr] Failed to calculate number of tokens, falling back to approximate count Error: Unknown model
1344 [Stderr]     at getEncodingNameForModel (/snapshot/appmap-js/node_modules/js-tiktoken/dist/lite.cjs:239:13)
1344 [Stderr]     at encodingForModel (/snapshot/appmap-js/node_modules/@langchain/core/dist/utils/tiktoken.cjs:23:59)
1344 [Stderr]     at ChatOpenAI.getNumTokens (/snapshot/appmap-js/node_modules/@langchain/core/dist/language_models/base.cjs:205:75)
1344 [Stderr]     at /snapshot/appmap-js/node_modules/@langchain/openai/dist/chat_models.cjs:1249:35
1344 [Stderr]     at Array.map (<anonymous>)
1344 [Stderr]     at ChatOpenAI.getNumTokensFromGenerations (/snapshot/appmap-js/node_modules/@langchain/openai/dist/chat_models.cjs:1243:64)
1344 [Stderr]     at ChatOpenAI._generate (/snapshot/appmap-js/node_modules/@langchain/openai/dist/chat_models.cjs:1142:53)
1344 [Stderr]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
1344 [Stderr]     at async Promise.allSettled (index 0)
1344 [Stderr]     at async ChatOpenAI._generateUncached (/snapshot/appmap-js/node_modules/@langchain/core/dist/language_models/chat_models.cjs:176:29)
1344 [Stderr]     at async LLMChain._call (/snapshot/appmap-js/node_modules/langchain/dist/chains/llm_chain.cjs:162:37)
1344 [Stderr]     at async LLMChain.invoke (/snapshot/appmap-js/node_modules/langchain/dist/chains/base.cjs:58:28)
1344 [Stderr]     at async LLMChain.predict (/snapshot/appmap-js/node_modules/langchain/dist/chains/llm_chain.cjs:188:24)
1344 [Stderr]     at async ConversationSummaryMemory.predictNewSummary (/snapshot/appmap-js/node_modules/langchain/dist/memory/summary.cjs:71:16)
1344 [Stderr]     at async LangchainMemoryService.predictSummary (/snapshot/appmap-js/packages/navie/dist/services/memory-service.js:19:25)
1344 [Stderr]     at async ExplainCommand.execute (/snapshot/appmap-js/packages/navie/dist/commands/explain-command.js:96:29)
1344 [Stderr]     at async Navie.execute (/snapshot/appmap-js/packages/navie/dist/navie.js:165:30)
1344 [Stderr]     at async LocalNavie.ask (/snapshot/appmap-js/packages/cli/built/rpc/explain/navie/navie-local.js)
1344 [Stderr]     at async Explain.explain (/snapshot/appmap-js/packages/cli/built/rpc/explain/explain.js)
1344 [Stdout] [local-navie] Processing question 23d10636-9e6f-44b7-9d08-d38fafd8276c in thread 8def1bad-780a-4333-b5b4-448eec9dbd34
github-actions[bot] commented 4 days ago

Title

Resolve Token Overflow Issue in Sonnet Mode of GitHub Copilot

Problem

When running in the Sonnet mode of GitHub Copilot, there is an error occurring related to token overflow recovery. Specifically, the system fails to calculate the number of tokens, leading to fallback on an approximate count. This error is due to an "Unknown model" response when attempting to determine correct encoding for the model in use.

Analysis

The root cause of the problem lies in the incorrect model identifier being utilized when invoking getEncodingNameForModel, resulting in an "Unknown model" error during token count calculations. This error indicates that the current logic does not recognize the model being used or the model identifier provided is not accurately interpreted within the js-tiktoken library.

For the getEncodingNameForModel function to execute correctly, the model identifier, presumably originating from the ChatOpenAI or LLMChain setup, must be mapped correctly to the encoding configurations understood by the library. A lack of proper mapping leads to failure in token calculations essential for managing inputs and outputs of the Sonnet mode.

Proposed Changes

  1. js-tiktoken Library Integration:

    • Location: /snapshot/appmap-js/node_modules/js-tiktoken/dist/lite.cjs:239:13
    • Ensure a comprehensive mapping of encoding names to models supported by getEncodingNameForModel. Add any missing model identifiers causing the "Unknown model" error, if not present, should be verified against expected model listings.
  2. Langchain Library Configuration:

    • Location: /snapshot/appmap-js/node_modules/@langchain/core/dist/utils/tiktoken.cjs:23:59
    • Revise the function encodingForModel to correctly obtain and pass the model-specific encoding needed for token calculation. This may involve fetching or parsing model details differently to align with encoding names recognized by js-tiktoken.
  3. Adjustment in ChatOpenAI Module:

    • Location: /snapshot/appmap-js/node_modules/@langchain/core/dist/language_models/base.cjs:205:75 and /snapshot/appmap-js/node_modules/@langchain/openai/dist/chat_models.cjs:1243:64
    • Review and ensure that the model identifiers being used in ChatOpenAI.getNumTokensFromGenerations and any associated routines are valid and correctly handled by tiktoken library.
  4. LLMChain Component Debugging:

    • Location: /snapshot/appmap-js/node_modules/langchain/dist/chains/llm_chain.cjs:162:37
    • Conduct a detailed check on how the chain setups dictate model identifiers, ensuring all chain invocations embed valid model-to-encoding references.
  5. Error Handling Improvements:

    • Add mechanisms to log better diagnostic messages upon encountering model-related errors, facilitating easier troubleshooting in future instances.

The above steps should enhance the robustness of the Copilot Sonnet mode's token management by aligning model usage with encoding processing libraries, ensuring reliable function performance without fallback on approximations.