Change UTF-8 decoder to avoid maximum stack size being exceeded

StableLib / stablelib

A stable library of useful TypeScript/JavaScript code

https://www.stablelib.com

Other

173 stars 35 forks source link

Change UTF-8 decoder to avoid maximum stack size being exceeded #6

Closed ernieturner closed 7 years ago

ernieturner commented 7 years ago

While using the UTF-8 library today, I came across a scenario where the number of bytes I was trying to convert back into UTF-8 was failing with a "Maximum call stack size exceeded". I did some research and discovered that this was happening because of the String.fromCharCode.apply(null, chars) call as Chrome only supports something like ~125K parameters, and so doing an apply with an array that's larger than that causes the max stack size to be reached.

There's likely a myriad of different ways to fix this, but I've come up with a proposal below which just generates the array of UTF-8 strings as it goes, then just does an Array.join to concat them all together. I couldn't quite figure out how to run the benchmarks, but I did some pretty quick hasty testing and I didn't see much of a runtime difference when converting a 32K string into bytes and back to UTF-8.

Let me know if you'd like me to poke at some other things. I haven't messed with lerna much so I wasn't sure exactly of the process for running the unit tests either.

dchest commented 7 years ago

Thank you! It was silly of me to think that I could get away with String.fromCharCode.apply.

I've also added a failing test, which your PR fixes.

Regarding performance and memory use, I suspect that modern JS engines will work better with a simple string concatenation -- at least memory-wise -- but if there's VM which doesn't optimize this, it would behave pretty badly, so I think your solution is a nice middle ground. I'll probably later add support for TextEncoder if it's available, so that newer VMs won't even have to worry about running this code.

dchest commented 7 years ago

BTW, to run benchmarks you can do: yarn build && yarn bench inside the packages/utf8 directory. Here are the results:

Before:

utf8 encode  12356 ops           0.04 ms/op     24690.57 ops/sec     64.87 MiB/s
utf8 decode  26780 ops           0.02 ms/op     53506.09 ops/sec    140.58 MiB/s

After:

utf8 decode  4960 ops           0.10 ms/op      9904.95 ops/sec     26.02 MiB/s

Quite a performance hit, but since the implementation was incorrect, it doesn't matter.

With string concatenation (let s = ""; s += String.fromCharCode(...)):

utf8 decode  10346 ops           0.05 ms/op     20680.78 ops/sec     54.34 MiB/s

Maybe I'll change it to string concatenation later, but it's okay for now.

ernieturner commented 7 years ago

Thanks for the quick turnaround!