gendx / lzma-rs

An LZMA decoder written in pure Rust
MIT License
128 stars 27 forks source link

Use const generics to remove BitTree heap allocations #79

Closed chyyran closed 1 year ago

chyyran commented 2 years ago

Pull Request Overview

19 was blocked on #[feature(generic_const_exprs)] for the 1 << NUM_BITS issue but we do not actually need generic_const_exprs to utilize const generics at the expense of having a slightly surprising API for BitTree. Since BitTree is an internal implementation detail, I do not think it is too much of a trade-off for the performance benefits seen. The strong compile time guarantee of the validity of BitTree when using NUM_BITS in an expression can be replicated with a compile time assert in the constructor.

There are 2 different API shapes for BitTree proposed here. Both APIs validate their const arguments at compile time, and invalid instances of BitTree are still unconstructable.

1. BitTree<const NUM_BITS: usize, const PROBS_ARRAY_LEN: usize> (06aee699384477e09aa98e3d029197e927536b31)

  1. BitTree<const PROBS_ARRAY_LEN: usize> (260eefe24b47e07d8cff2ccc8afda36755ce5ca2)

The first signature must have the caller input both NUM_BITS, and PROBS_ARRAY_LEN == 1 << NUM_BITS as part of the type signature. The equality of PROBS_ARRAY_LEN is checked by a compile-time assert macro that simply checks that indeed PROBS_ARRAY_LEN == 1 << NUM_BITS. This prevents the construction of a BitTree where PROBS_ARRAY_LEN is not 1 << NUM_BITS

The second signature takes only PROBS_ARRAY_LEN to sidestep the requirement for generic_const_exprs. Instead, NUM_BITS is an associated constant that is calculated as floor(log_2(PROBS_ARRAY_LEN)) using PROBS_ARRAY_LEN.trailing_zeros() which has been const since Rust 1.32. The const assert instead checks that PROBS_ARRAY_LEN == 1 << (PROBS_ARRAY_LEN.trailing_zeros()). Thus, rather than BitTree being valid for any NUM_BITS and PROBS_ARRAY_LEN == 1 << NUM_BITS, BitTree in this API is valid for any PROBS_ARRAY_LEN such that PROBS_ARRAY_LEN == 2 ** floor(log_2(PROBS_ARRAY_LEN)), which gives us effectively the same guarantees as the first signature without the redundant NUM_BITS argument.

Both signatures compile to the same code and have the same performance. It is a matter of preference and readability on which to take. I have a slight preference for the second signature because it is less noisy and seeing BitTree<{1 << 8}> for example seems fairly obvious, but purely looking at the implementation, I prefer the less 'clever' first signature.

This PR bumps the MSRV to 1.57 for const generics and const_panic.

Benchmarks

In general, variances are reduced and there is a speed bump when accessing the dictionary during decompression. While I did apply const-generics to encode::rangecoder::BitTree for consistency, they don't seem to be currently used for compression so compression benchmarks should not be heavily affected if at all.

Without const generics

running 8 tests
test compress_65536                  ... bench:   1,312,743 ns/iter (+/- 45,836)
test compress_empty                  ... bench:         712 ns/iter (+/- 55)
test compress_hello                  ... bench:       1,022 ns/iter (+/- 18)
test decompress_after_compress_65536 ... bench:   1,157,757 ns/iter (+/- 64,077)
test decompress_after_compress_empty ... bench:       1,897 ns/iter (+/- 82)
test decompress_after_compress_hello ... bench:       2,256 ns/iter (+/- 165)
test decompress_big_file             ... bench:   3,615,488 ns/iter (+/- 128,563)
test decompress_huge_dict            ... bench:       2,318 ns/iter (+/- 135)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 16.21s

With const generic BitTree

test compress_65536                  ... bench:   1,317,185 ns/iter (+/- 58,197)
test compress_empty                  ... bench:         791 ns/iter (+/- 65)
test compress_hello                  ... bench:       1,108 ns/iter (+/- 45)
test decompress_after_compress_65536 ... bench:   1,189,749 ns/iter (+/- 60,014)
test decompress_after_compress_empty ... bench:         530 ns/iter (+/- 23)
test decompress_after_compress_hello ... bench:         787 ns/iter (+/- 26)
test decompress_big_file             ... bench:   3,600,336 ns/iter (+/- 375,302)
test decompress_huge_dict            ... bench:         873 ns/iter (+/- 33)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 11.17s

This should fix #19, after this PR, DecoderState only has the single heap allocation of literal_probs which is substantially more difficult to put on the stack.

Testing Strategy

chyyran commented 2 years ago

https://github.com/gendx/lzma-rs/pull/79/commits/37e09d126b83bbb390763af5a24626c3ab1dde98 represents an alternative API with BitTree parameterized on a single generic, which is unfortunately not NUM_BITS but PROBS_ARRAY_LEN as we can't use associated constants in struct definitions. This relies on the idea that that 1 << n has the inverse (1 << n).trailing_zeroes() for small values of n, both of which are const. We can then use NUM_BITS as an associated constant.

The signature of BitTree here is cleaner, but the downside is that it is now possible to construct invalid instances of BitTree since there is no way to check that PROBS_ARRAY_LEN is the intended value without passing in NUM_BITS. For this reason I prefer the implementation in https://github.com/gendx/lzma-rs/pull/79/commits/25872085e4a339d9e33c4dd63b8853b24b54e484

But since BitTree is an internal API this really is just a matter of preference. Both implementations should compile down to the same code. Both are here to see which one you think is clearer for readability.

chyyran commented 2 years ago

https://github.com/gendx/lzma-rs/pull/79/commits/9223b59a1fd13b28bb88aa27a632f7dbc867fd9c introduces a combining of both approaches. While the generic argument is still 1 << NUM_BITS (removed the helper const function since I thought it was clear enough without it), I realized there can still be some safety had to ensure that as long as n.trailing_zeros() returned log_2(n) as expected the input array length would be valid.

chyyran commented 2 years ago

Marking this as ready since #77 was merged. I've cleaned up the prior commits and just left the two possible APIs.

@gendx you will want to review both 06aee699384477e09aa98e3d029197e927536b31 and 260eefe24b47e07d8cff2ccc8afda36755ce5ca2 which are identical except for the signature of BitTree, as explained in the overview.

Given that generic_const_exprs doesn't seem to be coming very soon, and that BitTree is an internal API that isn't exposed, I'm of the opinion that the perf improvements are worth the hit to readability, but you may want to consider re-running some benchmarks on your end as well.

Unfortunately DecoderState::literal_probs is too big to fit on the stack so I believe this will close #19 unless there are other places where const generics can be taken advantage of.

codecov-commenter commented 2 years ago

Codecov Report

Base: 86.83% // Head: 87.08% // Increases project coverage by +0.25% :tada:

Coverage data is based on head (571be60) compared to base (17bb25f). Patch coverage: 99.32% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #79 +/- ## ========================================== + Coverage 86.83% 87.08% +0.25% ========================================== Files 19 19 Lines 2484 2540 +56 ========================================== + Hits 2157 2212 +55 - Misses 327 328 +1 ``` | Flag | Coverage Δ | | |---|---|---| | integration | `81.66% <100.00%> (+0.06%)` | :arrow_up: | | unit | `55.66% <93.24%> (+1.16%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/gendx/lzma-rs/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [src/encode/rangecoder.rs](https://codecov.io/gh/gendx/lzma-rs/pull/79/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2VuY29kZS9yYW5nZWNvZGVyLnJz) | `96.58% <98.80%> (+0.24%)` | :arrow_up: | | [src/decode/lzma.rs](https://codecov.io/gh/gendx/lzma-rs/pull/79/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2RlY29kZS9sem1hLnJz) | `91.74% <100.00%> (+0.09%)` | :arrow_up: | | [src/decode/rangecoder.rs](https://codecov.io/gh/gendx/lzma-rs/pull/79/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2RlY29kZS9yYW5nZWNvZGVyLnJz) | `95.36% <100.00%> (+0.04%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

chyyran commented 2 years ago

I've rebased on master, fixing the conflicts and squashed the commits into one. Looks like there's a small improvement in code coverage as well!

chyyran commented 1 year ago

@gendx Any updates on getting this merged?

gendx commented 1 year ago

Apologies again for the delay. I'll merge this now and fix any pending issue on top of it.

Thanks a lot for your contribution!