dragly commented 4 years ago

This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where alloc_zeroed is slow.

dragly commented 4 years ago

We have had performance issues when parsing smaller OpenCTM files using lzma-rs. Profiling showed that the allocation of the LZCircularBuffer was the issue, where it will eventually call alloc_zeroed in the allocator. This slowdown is apparently caused by memset being pretty slow when running WebAssembly in browsers. In particular, files that take less than a few milliseconds to actually parse will spend an additional 100 ms just to allocate this buffer if the dict size is large. In our case, the dict size is 67M in these files, even though the file itself may be just a few hundred bytes.

dragly commented 4 years ago

And I am not sure if this is the cleanest way of achieving dynamic resizing. Let me know if you have any suggestions :)

dragly commented 4 years ago

Thanks for the comments and feedback! And sorry about the late reply. I did not have time to get back to this before now. I agree with your input and will make sure we do not risk shrinking the buffer.

dragly commented 4 years ago

This took a while, but I have adapted the code to your comments now.

gendx commented 4 years ago

bors r+

bors[bot] commented 4 years ago

Timed out

ibaryshnikov commented 4 years ago

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

gendx commented 4 years ago

I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?

Good point. I also noticed it in https://github.com/gendx/lzma-rs/pull/22#pullrequestreview-335624836, but I think 5% could be acceptable. However, I've now filed #27 to track this regression, feel free to send a pull-request so that it can be fixed before the next release.

catenacyber commented 4 years ago

Just to mention this is a similar intent of what I have done here : https://github.com/OISF/libhtp/commit/fe16fa764f7cea57be5a288ee85b27dffc460f6f#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

dragly commented 4 years ago

@catenacyber interesting! Thanks for sharing :)

Why is it important not to go over it? Can the decoding fail if we do so, or is there a different reason to stay within the size?

catenacyber commented 4 years ago

I am no LZMA expert, so I might be mistaken.

But what is the behavior of other lzma tools, when the header specifies a dictionary size of, for instance, 4096 bytes, and the decompression somehow tries to access item 4097 of dictionary ? Let me know if I am not clear. cf line 1041 here : https://github.com/OISF/libhtp/commit/2fb0091981c95a3bf8393703484bcb2871678b63#diff-1143601df653d71da0a4ff7c68de9e29R1041

gendx commented 4 years ago

Just to mention this is a similar intent of what I have done here : OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7

For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.

From what I see, the indices passed to get are currently always modulo the dict size, so we shouldn't reach past the dictionary size of the header. That said:

We could add checks in debug_assertions mode inside of get so that indeed this doesn't happen.
We should have unit tests for it. Implementing #24 would greatly help to write these kinds of unit tests (and such tests could be helpful to compare to other LZMA parsers as well).

gendx / lzma-rs

Do not allocate buffer size before necessary #22

Timed out