Open Domiii opened 3 years ago
Thanks for the detailed report :+1:
What are the 100M values? Plain strings? Could you please share some code reproducing the issue?
Did you try with another messagepack implementation like @msgpack/msgpack
or what-the-pack
? Do you encounter the same behavior?
msgpack
implementations have a socket.io
parser, do they?But I can offer a few more insights regarding defers
. I just ran a sample:
23.8M
values1.6G
between before the recursive _encode
call and after (from 1.2 -> 2.8)bytes.length
~ 53M
defers.length
~ 11.5M
buf size
~ 141M
It does not seem impossible that the defers
array is the culprit.
Do you want to try to create your own sample with some dummy arrays containing a ton of strings?
Isn't it better to use compression algorithms (like gzip) to encode data of huge sizes (e.g. 10MB and above)? On client-side (browser) maybe pako can decode it on the main thread or on the worker thread.
uWebSockets.js is a great alternative to socket.io and other web frameworks too (e.g. express, koa, hapi, ws)
@joshxyzhimself The issue here is with encode
, not with encryption or the transport layer. encode
has a memory leak causing it to gobble up 4+gb of memory (and then crash) to encode only 200+MB of data (arrays, objects, strings, numbers).
@darrachequesne To answer your question:
Things are working after switching to a custom parser around @msgpack/msgpack
for socket.io
.
Due to unoptimized algorithm (as also discussed in #12),
encode
is a memory hog (I have not looked atdecode
yet). I decided to post this as a separate issue, since the other issue's title does not capture the problem, and the discussion mostly focusses on execution speed, not on memory issues.In my case, I am sending data with
socket.io
, and this is my journey:encode
it ran out of memory. I had to increase node's RAM limit to--max-old-space-size=8192
.298,406,623
_encode
call itself required 4GB of additional memory (even though, as mentioned above, buffersize
is less than 300MB total).encode
algorithm itself.process.memoryUsage()
. All three (rss
,heapTotal
, andheapUsed
) show the same trend.Possible Solution
I strongly suggest to heed manast's suggestion to use a direct buffer allocation approach. In case that buffer size is unknown, just run the algorithm once to compute buffer size and index positions, then re-run to actually populate, rather than using the current approach of creating temporary utility objects. This should come at a much lower memory (and probably CPU) cost, than the current version.
I know the owner currently does not have time to work on this, but one can dream :)