dqbd / tiktoken

JS port and JS/WASM bindings for openai/tiktoken
MIT License
648 stars 49 forks source link

[js-tiktoken] memory leak #50

Open louis030195 opened 1 year ago

louis030195 commented 1 year ago

Hey, just dropping it here, i think there is a memory leak, have a lot of memory issues dealing with large data

Weirdly I thought js-tiktoken was not using wasm:

RangeError [Error]: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance

Sometimes

<--- Last few GCs --->

[37648:0x130008000]   106903 ms: Mark-Compact (reduce) 4081.1 (4142.1) -> 4081.1 (4142.1) MB, 2284.42 / 0.00 ms  (average mu = 0.377, current mu = 0.000) last resort; GC in old space requested
[37648:0x130008000]   111844 ms: Mark-Compact (reduce) 4081.1 (4142.1) -> 4081.1 (4142.1) MB, 4940.58 / 0.00 ms  (average mu = 0.182, current mu = 0.000) last resort; GC in old space requested

<--- JS stacktrace --->

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Even using

NODE_OPTIONS="--max-old-space-size=8192"

Also tried to release memory along the looping without any improvement.

If the problem keeps annoying me I'll look at your code to see if I can send a PR fix

dqbd commented 1 year ago

Hello! js-tiktoken should definitely not use WASM, would it be possible to share a reproducible use case?

louis030195 commented 1 year ago

This is the code I used https://github.com/different-ai/embedbase/blob/main/sdk/embedbase-js/src/split/index.ts

SunnyGPT commented 1 year ago

Hello! js-tiktoken should definitely not use WASM, would it be possible to share a reproducible use case?

Hi bro, did you notice your email, I wrote you a request for help and would be happy to pay you for the content, would you mind checking it out and replying to me?