darrachequesne / notepack

A fast Node.js implementation of the latest MessagePack spec
MIT License
75 stars 19 forks source link

`encode` is a crazy memory hog #27

Open Domiii opened 3 years ago

Domiii commented 3 years ago

Due to unoptimized algorithm (as also discussed in #12), encode is a memory hog (I have not looked at decode yet). I decided to post this as a separate issue, since the other issue's title does not capture the problem, and the discussion mostly focusses on execution speed, not on memory issues.

In my case, I am sending data with socket.io, and this is my journey:

Possible Solution

I strongly suggest to heed manast's suggestion to use a direct buffer allocation approach. In case that buffer size is unknown, just run the algorithm once to compute buffer size and index positions, then re-run to actually populate, rather than using the current approach of creating temporary utility objects. This should come at a much lower memory (and probably CPU) cost, than the current version.

I know the owner currently does not have time to work on this, but one can dream :)

darrachequesne commented 3 years ago

Thanks for the detailed report :+1:

What are the 100M values? Plain strings? Could you please share some code reproducing the issue?

Did you try with another messagepack implementation like @msgpack/msgpack or what-the-pack? Do you encounter the same behavior?

Domiii commented 3 years ago
  1. The values are mostly objects in arrays and they are nested a few times (some 5 to 7 layers deep). The raw values are mostly numbers, and some strings. (But there is no circular references; I'm rather certain.)
  2. I don't think other msgpack implementations have a socket.io parser, do they?
  3. I cannot really re-produce an isolated sample right now (timewise)

But I can offer a few more insights regarding defers. I just ran a sample:

It does not seem impossible that the defers array is the culprit.

Do you want to try to create your own sample with some dummy arrays containing a ton of strings?

joshxyzhimself commented 2 years ago

Isn't it better to use compression algorithms (like gzip) to encode data of huge sizes (e.g. 10MB and above)? On client-side (browser) maybe pako can decode it on the main thread or on the worker thread.

uWebSockets.js is a great alternative to socket.io and other web frameworks too (e.g. express, koa, hapi, ws)

Domiii commented 2 years ago

@joshxyzhimself The issue here is with encode, not with encryption or the transport layer. encode has a memory leak causing it to gobble up 4+gb of memory (and then crash) to encode only 200+MB of data (arrays, objects, strings, numbers).

@darrachequesne To answer your question: Things are working after switching to a custom parser around @msgpack/msgpack for socket.io.