SortaCore / lzma2-js

JavaScript LZMA2, based on https://github.com/LZMA-JS/LZMA-JS.
https://dark-wire.com/storage/extlist.php
MIT License
1 stars 0 forks source link

Fails to lzma2_decompress data larger than 16MB #2

Open corwin-of-amber opened 3 years ago

corwin-of-amber commented 3 years ago

I am submitting this issue mostly for posterity in case someone else encounters the same problem in the near future. Perhaps I will be able to fix it at some point.

I have tried decompressing a binary stream (tar) whose compressed size is 11MB and uncompressed size 42MB. I am getting 'corrupt data'. Some superficial debugging shows that this occurs at precisely the point in the execution where the output buffer size (this$static.d.output.buf) is 16777216 (= 1 << 24). It also happens that the value of rep0 computed at this line:

https://github.com/SortaCore/lzma2-js/blob/dbe9e72a956c3edb4d2a81212cf9a043b6a1c491/lzma2_worker.js#L1210

is -2147483648 (= 1 << 31).

So it looks like a case of integer overflow.

SortaCore commented 3 years ago

Open an issue on LZMA-JS too, since that seems to be from their code. Although they seem to have issues dating to 2017 so might not be solved by them either.

corwin-of-amber commented 3 years ago

Yes, I was suspecting this was the case. I was planning to try the same with LZMA not-2 and then open a respective issue.

Indeed probably the best with current technology would be to have a small WebAssembly binary using the code from XZ Utils.

SortaCore commented 3 years ago

I think we can work around it by * Math.pow(2, numDirectBits) instead of << numDirectBits, which will give us 56 bits instead of 32, due to JS internal float precision. Ideally, with a check that it's 56 or less to make sure it falls over gracefully instead of potentially truncating. Let me know if it works for you ^.^

I think WebAssembly is based on x86, so 32-bit, so would have the same issue anyway.

corwin-of-amber commented 3 years ago

Will try, thanks!

As for WebAssembly, indeed, it is natively 32-bit. But I assume XZ Utils have worked on 32-bit processors before; so if they needed more than 32 bits, they should have used a long long or uint64_t, which is supported in WebAssembly IR.

corwin-of-amber commented 3 years ago

I think we can work around it by * Math.pow(2, numDirectBits) instead of << numDirectBits, which will give us 56 bits

Actually it does not make sense for rep0 to be that large. It should be less than the current position and also less than DictionarySize, which in my case is 8Mi. So something deeper is at work here.