dqbd / tiktoken

JS port and JS/WASM bindings for openai/tiktoken
MIT License
649 stars 49 forks source link

Failed to calculate number of tokens with tiktoken, falling back to approximate count #33

Closed braco closed 1 year ago

braco commented 1 year ago

I'm running langchainjs with its default summarizer, in a loop over different documents. tiktoken seems to start producing this error at some point, and closing / reopening the process eliminates the error.

Failed to calculate number of tokens with tiktoken, falling back to approximate count RuntimeError: unreachable
    at wasm://wasm/00b5f812:wasm-function[563]:0x6a72a
    at wasm://wasm/00b5f812:wasm-function[665]:0x6fd7a
    at wasm://wasm/00b5f812:wasm-function[756]:0x70f7f
    at wasm://wasm/00b5f812:wasm-function[237]:0x5c43a
    at wasm://wasm/00b5f812:wasm-function[200]:0x4db89
    at wasm://wasm/00b5f812:wasm-function[34]:0x1f78a
    at wasm://wasm/00b5f812:wasm-function[159]:0x48dc3
    at Tiktoken.encode (/project/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:262:18)
    at OpenAIChat.getNumTokens (file:///project/node_modules/langchain/dist/base_language/index.js:80:44)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Failed to calculate number of tokens with tiktoken, falling back to approximate count Error: S@/b
    at module.exports.__wbindgen_error_new (/project/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:410:17)
    at wasm://wasm/00b5f812:wasm-function[59]:0x29389
    at module.exports.encoding_for_model (/project/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:177:14)
    at OpenAIChat.getNumTokens (file:///project/node_modules/langchain/dist/base_language/index.js:70:38)
    at async Promise.all (index 1104)
    at async MapReduceDocumentsChain._call (file:///project/node_modules/langchain/dist/chains/combine_docs_chain.js:154:28)
    at async MapReduceDocumentsChain.call (file:///project/node_modules/langchain/dist/chains/base.js:50:28)
    at async summarizer (file:///project/lib/gpt.mjs:179:20)
❯ node --version                        
v18.12.1
❯ yarn why @dqbd/tiktoken
=> Found "@dqbd/tiktoken@1.0.6"

Also filed in langchainjs, not sure where the issue is: https://github.com/hwchase17/langchainjs/issues/1009

dqbd commented 1 year ago

Hi @braco!

Thank you for the report, it does seem like langchain keeps reinstatiating the encoder without freeing it. Will look into further into it after some time.

dqbd commented 1 year ago

In general, langchain will be fixed in https://github.com/hwchase17/langchainjs/pull/1239 by replacing WASM package with the JS port.

The note with .free() still should be valid for the WASM package. Closing for now!