Frommi / miniz_oxide

Rust replacement for miniz
MIT License
168 stars 48 forks source link

Parameters to lower memory consumption #144

Open bnjjj opened 5 months ago

bnjjj commented 5 months ago

Hello ! I would like to know if there are some parameters available to lower memory consumption ? Like for example in zlib we can play with window_bits parameter when compressing. They also have memory level and so one. I would really like to stay on miniz_oxyde instead of using zlib. My use case is to compress a long lived stream of memory I want to lower the memory usage of the compression

bnjjj commented 5 months ago

Something like provided here with mem_level and window_bits

oyvindln commented 5 months ago

The window_bits/mem_level feature isn't implemented in miniz_oxide (and original miniz, the paramter is just provided for API compatability for zlib, it's just ignored) currently.

The compress_fast function used for the fastest compression level may result in a smaller max window size in practice compared to the normal compression function (would need to verify to be sure) but I don't think it result in any smaller buffer allocation internally for the compressor object as of now if you need to minimize that.

The memory savings of the window_bits/mem_level thing in zlib is in the level of 32kb so it's only of relevance on extremely memory constrained systems hence there why there hasn't been a lot of incentive to implement it in alternative implementations.

bnjjj commented 5 months ago

Thanks @oyvindln for your answer.

So I played a little bit with the codebase and my tests. I'm sorry if I did some very dumb things but I wanted to share what I found and see if we could find an easy way to implement like a new CompressionLevel::LowMemory but basically when I'm putting a lower value on LZ_CODE_BUF_SIZE, LZ_HASH_BITS here I tried with 1024 for LZ_CODE_BUF_SIZE and LZ_HASH_BITS = 10, I also updated the value of LEVEL1_HASH_SIZE_MASK = 1000 and my memory is way lower, I can still have the right compressed data. I'm interested to open a PR providing this new CompressionLevel by setting lower values on these buffer sizes. What do you think ? If it's something that you would be open to, I would like to know more about all related constants to LZ_CODE_BUF_SIZE because that's the main one I would like to lower but it has dependencies for sure.

bnjjj commented 5 months ago

For instance, these data look good with my current use case but I would like to know the risk of lowering the size of the dictionary, because it's like the most impacting parameter.

pub const LZ_CODE_BUF_SIZE: usize = 8 * 1024;
pub const LZ_HASH_BITS: i32 = 12;
pub(crate) const LZ_DICT_SIZE: usize = 4096;
oyvindln commented 5 months ago

There shouldn't be any risk on the compression side other than lower compression ratio. However, if they are made into non-const values it could have notable impacts on the compiler's ability to optimize things like removing bounds checks so if they are to be made configurable it should ideally be done using generics or similar in some way so one can keep the benefits of that.

On the decompression side you can't really rely on the data in the DEFLATE stream actually obeying the maximum window size that has been set in the header DEFLATE header unless it's data you yourself is responsible for, since it's just a flag. So, it's only really usable for decompressing data you yourself have compressed with a lower window size since otherwise deflate compressed data will have been using the max window size in probably 99.9% of cases.