denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
3.04k stars 606 forks source link

msgpack.encode fails with `RangeError: Maximum call stack size exceeded` on big objects #3696

Closed pl-42 closed 11 months ago

pl-42 commented 12 months ago

Describe the bug My app may receive huge jsons (with thousands of items and very complex structure) and shall re-encode the json into msgpack. JSON.parse works perfectly fine but msgpack.encode fails with Uncaught RangeError: Maximum call stack size exceeded.

Steps to Reproduce Here is a simple code to reproduce it:

const bigObject = {};

for (let i = 0; i < 50000; i++) {
  Object.defineProperty(bigObject, `prop_${i}`, {
    value: i,
    enumerable: true
  });
}

import * as msgpack from "https://deno.land/std@0.204.0/msgpack/mod.ts";
const m = msgpack.encode(bigObject);

Expected behavior Should not fail and return the encoded Uint8Array.

Environment

pl-42 commented 12 months ago

Seems like the error is here: https://github.com/denoland/deno_std/blob/65125db61f8b59cf62475ca75749a5664e13b53e/msgpack/encode.ts#L53C3-L53C3

Code fails on reconstruction of byteParts.

When I change it to

export function encode(object: ValueType) {
  const byteParts: Uint8Array[] = [];
  encodeSlice(object, byteParts);
  return concat(byteParts);
}

everything works as expected even on object with 5.000.000 properties. Slow, consuming a lot of memory, but doesn't fail with unexpected exception.

lino-levan commented 12 months ago

Nice catch, not something that I would expect. I had no idea that the spread operator could cause a stack overflow. Are you willing to submit a PR?

pl-42 commented 12 months ago

@lino-levan here is the PR #3697

pl-42 commented 12 months ago

@lino-levan I have to revert the PR and will re-submit the new one

pl-42 commented 12 months ago

3698

iuioiua commented 11 months ago

I don't think the solution is to make concat() accept Uint8Array[][]. Instead, we should probably just remove the spread operator. This would be a breaking fix. If there's a performance cost, we can also further refactor the logic of concat() to be something like this:

export function concat(buf: Uint8Array[]): Uint8Array {
  let length = 0;
  for (const b of buf) {
    length += b.length;
  }

  const output = new Uint8Array(length);
  let index = 0;
  for (const b of buf) {
-   output.set(b, index);
-   index += b.length;
+   for (let i = 0; i < b.length; i++) {
+     output[index++] = b[i];
+   }
  }

  return output;
}
nhrones commented 11 months ago

@lino-levan In my Kv-key-codec I use a single expandable Uint8Array accumulator buffer. All encoded values are immediately inserted into the accumulator. (no concat required).

It may have some value for the msgPack large buffer issue.

https://gist.github.com/nhrones/a872b581a4685c2b6374b080d318123d

nhrones commented 11 months ago

@iuioiua https://github.com/denoland/deno_std/issues/3696#issuecomment-1776424324

TypedArrays are heavily optimized in V8! The underlying ArrayBuffer represents a contiguous block of binary data. Because of this, the set operation is extremely fast. Please see: https://forum.babylonjs.com/t/optimizing-performance-of-binarywriter-resizebuffer/37516

nhrones commented 11 months ago

/std/bytes/concat.ts

I don't understand the use of the spread operator here? You're spreading a param that is already an array, to an array! What was the purpose of that? What would break without it?

export function concat(...buf: Uint8Array[]): Uint8Array {
  let length = 0;
  for (const b of buf) {
    //...
  }

// is the same as 

export function concat(buf: Uint8Array[]): Uint8Array {
  let length = 0;
  for (const b of buf) {
    //...
  }

Why would there be a concern to changing that in std/bytes/ ? I can't see how that could cause any break in existing use cases!

As far as I can tell (...buf: Uint8Array[]) is equal to (buf: Uint8Array[])

Of course I could be very wrong; and often I am where javascript is concerned.

iuioiua commented 11 months ago

Current use of concat():

concat(data1, data2, data3);

Once the spread operator is removed, it must be:

concat([data1, data2, data3]);
nhrones commented 11 months ago

@iuioiua Thanks for the explanation. I see the dilemma. I'm writing a revised msgpack encoder, based on my expandable buffer in Kvkey Codec. That codec works quite well, and I see the msgpack encoder is quite similar. Would you like a heads up when its ready to test? This buffer would allow you to remove the /std/byte/ dependency.

iuioiua commented 11 months ago

Frankly, I don't see a reason to move away from concat() once it's fixed. It's a versatile yet simple function. Nothing wrong with depending on it at this point.

nhrones commented 11 months ago

By the way, I finished the encoder mod, and it works great; even with BigObj. https://github.com/nhrones/msgPack