Closed chyyran closed 2 years ago
Benchmarks on master
on reusing a preallocated buffer
$ cargo bench
Finished bench [optimized] target(s) in 0.16s
Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)
running 14 tests
test decode::options::test::test_options ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_length_all ... ignored
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_zeros ... ignored
test error::test::test_display ... ignored
test xz::test::test_checkmethod_roundtrip ... ignored
test xz::test::test_streamflags_roundtrip ... ignored
test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)
running 8 tests
test compress_65536 ... bench: 1,384,063 ns/iter (+/- 26,842)
test compress_empty ... bench: 701 ns/iter (+/- 29)
test compress_hello ... bench: 1,011 ns/iter (+/- 40)
test decompress_after_compress_65536 ... bench: 2,558,868 ns/iter (+/- 45,196)
test decompress_after_compress_empty ... bench: 2,130 ns/iter (+/- 49)
test decompress_after_compress_hello ... bench: 2,683 ns/iter (+/- 67)
test decompress_big_file ... bench: 3,929,622 ns/iter (+/- 98,028)
test decompress_huge_dict ... bench: 2,741 ns/iter (+/- 79)
test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 8.87s
Benchmarks on this branch with default Vec::new()
behaviour
$ cargo bench
Compiling lzma-rs v0.2.0 (/mnt/d/coding/lzma-rs)
Finished bench [optimized] target(s) in 2.43s
Running unittests (target/release/deps/lzma_rs-0d912beaf58a9aab)
running 14 tests
test decode::options::test::test_options ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_length_all ... ignored
test encode::rangecoder::test::test_encode_decode_length_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_all ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_ones ... ignored
test encode::rangecoder::test::test_encode_decode_reverse_bittree_zeros ... ignored
test encode::rangecoder::test::test_encode_decode_zeros ... ignored
test error::test::test_display ... ignored
test xz::test::test_checkmethod_roundtrip ... ignored
test xz::test::test_streamflags_roundtrip ... ignored
test result: ok. 0 passed; 0 failed; 14 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests (target/release/deps/lzma-99b2a32c4e93afcf)
running 8 tests
test compress_65536 ... bench: 1,357,296 ns/iter (+/- 67,203)
test compress_empty ... bench: 996 ns/iter (+/- 25)
test compress_hello ... bench: 1,306 ns/iter (+/- 59)
test decompress_after_compress_65536 ... bench: 2,522,253 ns/iter (+/- 58,323)
test decompress_after_compress_empty ... bench: 2,171 ns/iter (+/- 138)
test decompress_after_compress_hello ... bench: 2,655 ns/iter (+/- 268)
test decompress_big_file ... bench: 3,884,622 ns/iter (+/- 100,805)
test decompress_huge_dict ... bench: 2,732 ns/iter (+/- 199)
test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 12.32s
Looks to be within a 2% difference...
Converting this to a draft as I'm not entirely happy with having to pass ownership around where unnecessary. Even if Vec::new()
is effectively alloc-free that may not be true for some other storage in the future if that ever gets going.
Something like https://github.com/SnowflakePowered/lzma-rs/commit/9ea9da404f1f0d703f7af7a91d11beb8c1696a46 holds a mutable reference instead is probably preferable but that adds a somewhat noisy lifetime to LzBuffer
.
Closing this for now, needs a little more work.
Pull Request Overview
This is a small experiment building off of the raw primitives introduced in #74 to design a solution for #27.
This PR introduces the functions
LzmaDecoder::new_with_buffer
andLzma2Decoder::new_with_buffer
that allow constructing an LZMA(2) decoder with a preallocatedVec<u8>
buffer. The buffer is then reused on subsequent calls todecompress
.Lzma(2)Decoder::reset
will clear (memset
) the buffer without dropping the prior allocation although it may be resized if necessary within internal decompression routines.This allows consumers to pre-allocate a buffer upfront to mitigate the performance impact seen after #22 if necessary when using the raw decoder API.
I'm not actually sure this gives any performance benefit but I'm just putting it out here to get some thoughts.
Help Wanted
This pull request still needs proper benchmarking in a variety of cases. For my own usage in https://github.com/SnowflakePowered/chd-rs, I saw absolutely no performance difference between preallocating a buffer or using a fresh one via
Vec::new()
. There was also no difference in performance for my use case gained by reusing the buffer on eachdecompress
.Testing Strategy
This pull request was tested by...
.lzma
,.lzma2
,.xz
files).