Closed dragly closed 4 years ago
We have had performance issues when parsing smaller OpenCTM files using lzma-rs. Profiling showed that the allocation of the LZCircularBuffer was the issue, where it will eventually call alloc_zeroed
in the allocator. This slowdown is apparently caused by memset being pretty slow when running WebAssembly in browsers. In particular, files that take less than a few milliseconds to actually parse will spend an additional 100 ms just to allocate this buffer if the dict size is large. In our case, the dict size is 67M in these files, even though the file itself may be just a few hundred bytes.
And I am not sure if this is the cleanest way of achieving dynamic resizing. Let me know if you have any suggestions :)
Thanks for the comments and feedback! And sorry about the late reply. I did not have time to get back to this before now. I agree with your input and will make sure we do not risk shrinking the buffer.
This took a while, but I have adapted the code to your comments now.
bors r+
I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?
I can see around 5% regression in speed after this update. Is it possible to use a feature or a runtime option? Or having several buffer implementations, and then passing the one you like from outside?
Good point. I also noticed it in https://github.com/gendx/lzma-rs/pull/22#pullrequestreview-335624836, but I think 5% could be acceptable. However, I've now filed #27 to track this regression, feel free to send a pull-request so that it can be fixed before the next release.
Just to mention this is a similar intent of what I have done here : https://github.com/OISF/libhtp/commit/fe16fa764f7cea57be5a288ee85b27dffc460f6f#diff-523a8b9d16e6d9f006c824e158eeb4b7
For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.
@catenacyber interesting! Thanks for sharing :)
Why is it important not to go over it? Can the decoding fail if we do so, or is there a different reason to stay within the size?
I am no LZMA expert, so I might be mistaken.
But what is the behavior of other lzma tools, when the header specifies a dictionary size of, for instance, 4096 bytes, and the decompression somehow tries to access item 4097 of dictionary ? Let me know if I am not clear. cf line 1041 here : https://github.com/OISF/libhtp/commit/2fb0091981c95a3bf8393703484bcb2871678b63#diff-1143601df653d71da0a4ff7c68de9e29R1041
Just to mention this is a similar intent of what I have done here : OISF/libhtp@fe16fa7#diff-523a8b9d16e6d9f006c824e158eeb4b7
For resizing, I would suggest it is important to keep track of the dictionary size mentioned in the header, and not go over it.
From what I see, the indices passed to get
are currently always modulo the dict size, so we shouldn't reach past the dictionary size of the header. That said:
debug_assertions
mode inside of get
so that indeed this doesn't happen.
This changes the way the LZCircularBuffer is allocated. Instead of allocating the full dict size immediately, the buffer is grown on demand. This helps when parsing smaller files with large dict sizes, especially when compiling to WebAssembly and running in browser where
alloc_zeroed
is slow.