Closed ernieturner closed 7 years ago
Thank you! It was silly of me to think that I could get away with String.fromCharCode.apply
.
I've also added a failing test, which your PR fixes.
Regarding performance and memory use, I suspect that modern JS engines will work better with a simple string concatenation -- at least memory-wise -- but if there's VM which doesn't optimize this, it would behave pretty badly, so I think your solution is a nice middle ground. I'll probably later add support for TextEncoder
if it's available, so that newer VMs won't even have to worry about running this code.
BTW, to run benchmarks you can do: yarn build && yarn bench
inside the packages/utf8
directory. Here are the results:
Before:
utf8 encode 12356 ops 0.04 ms/op 24690.57 ops/sec 64.87 MiB/s
utf8 decode 26780 ops 0.02 ms/op 53506.09 ops/sec 140.58 MiB/s
After:
utf8 decode 4960 ops 0.10 ms/op 9904.95 ops/sec 26.02 MiB/s
Quite a performance hit, but since the implementation was incorrect, it doesn't matter.
With string concatenation (let s = ""; s += String.fromCharCode(...)
):
utf8 decode 10346 ops 0.05 ms/op 20680.78 ops/sec 54.34 MiB/s
Maybe I'll change it to string concatenation later, but it's okay for now.
Thanks for the quick turnaround!
While using the UTF-8 library today, I came across a scenario where the number of bytes I was trying to convert back into UTF-8 was failing with a "Maximum call stack size exceeded". I did some research and discovered that this was happening because of the
String.fromCharCode.apply(null, chars)
call as Chrome only supports something like ~125K parameters, and so doing anapply
with an array that's larger than that causes the max stack size to be reached.There's likely a myriad of different ways to fix this, but I've come up with a proposal below which just generates the array of UTF-8 strings as it goes, then just does an
Array.join
to concat them all together. I couldn't quite figure out how to run the benchmarks, but I did some pretty quick hasty testing and I didn't see much of a runtime difference when converting a 32K string into bytes and back to UTF-8.Let me know if you'd like me to poke at some other things. I haven't messed with lerna much so I wasn't sure exactly of the process for running the unit tests either.