Closed chyyran closed 1 year ago
https://github.com/gendx/lzma-rs/pull/79/commits/37e09d126b83bbb390763af5a24626c3ab1dde98 represents an alternative API with BitTree
parameterized on a single generic, which is unfortunately not NUM_BITS
but PROBS_ARRAY_LEN
as we can't use associated constants in struct definitions. This relies on the idea that that 1 << n
has the inverse (1 << n).trailing_zeroes()
for small values of n
, both of which are const. We can then use NUM_BITS
as an associated constant.
The signature of BitTree
here is cleaner, but the downside is that it is now possible to construct invalid instances of BitTree
since there is no way to check that PROBS_ARRAY_LEN
is the intended value without passing in NUM_BITS
. For this reason I prefer the implementation in https://github.com/gendx/lzma-rs/pull/79/commits/25872085e4a339d9e33c4dd63b8853b24b54e484
But since BitTree
is an internal API this really is just a matter of preference. Both implementations should compile down to the same code. Both are here to see which one you think is clearer for readability.
https://github.com/gendx/lzma-rs/pull/79/commits/9223b59a1fd13b28bb88aa27a632f7dbc867fd9c introduces a combining of both approaches. While the generic argument is still 1 << NUM_BITS
(removed the helper const function since I thought it was clear enough without it), I realized there can still be some safety had to ensure that as long as n.trailing_zeros()
returned log_2(n) as expected the input array length would be valid.
Marking this as ready since #77 was merged. I've cleaned up the prior commits and just left the two possible APIs.
@gendx you will want to review both 06aee699384477e09aa98e3d029197e927536b31 and 260eefe24b47e07d8cff2ccc8afda36755ce5ca2 which are identical except for the signature of BitTree
, as explained in the overview.
Given that generic_const_exprs
doesn't seem to be coming very soon, and that BitTree
is an internal API that isn't exposed, I'm of the opinion that the perf improvements are worth the hit to readability, but you may want to consider re-running some benchmarks on your end as well.
Unfortunately DecoderState::literal_probs
is too big to fit on the stack so I believe this will close #19 unless there are other places where const generics can be taken advantage of.
Base: 86.83% // Head: 87.08% // Increases project coverage by +0.25%
:tada:
Coverage data is based on head (
571be60
) compared to base (17bb25f
). Patch coverage: 99.32% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I've rebased on master, fixing the conflicts and squashed the commits into one. Looks like there's a small improvement in code coverage as well!
@gendx Any updates on getting this merged?
Apologies again for the delay. I'll merge this now and fix any pending issue on top of it.
Thanks a lot for your contribution!
Pull Request Overview
19 was blocked on
#[feature(generic_const_exprs)]
for the1 << NUM_BITS
issue but we do not actually needgeneric_const_exprs
to utilize const generics at the expense of having a slightly surprising API forBitTree
. SinceBitTree
is an internal implementation detail, I do not think it is too much of a trade-off for the performance benefits seen. The strong compile time guarantee of the validity ofBitTree
when usingNUM_BITS
in an expression can be replicated with a compile time assert in the constructor.There are 2 different API shapes forBitTree
proposed here. Both APIs validate their const arguments at compile time, and invalid instances ofBitTree
are still unconstructable.1.BitTree<const NUM_BITS: usize, const PROBS_ARRAY_LEN: usize>
(06aee699384477e09aa98e3d029197e927536b31)BitTree<const PROBS_ARRAY_LEN: usize>
(260eefe24b47e07d8cff2ccc8afda36755ce5ca2)The first signature must have the caller input both
NUM_BITS
, andPROBS_ARRAY_LEN == 1 << NUM_BITS
as part of the type signature. The equality ofPROBS_ARRAY_LEN
is checked by a compile-time assert macro that simply checks that indeedPROBS_ARRAY_LEN == 1 << NUM_BITS
. This prevents the construction of aBitTree
wherePROBS_ARRAY_LEN
is not1 << NUM_BITS
The second signature takes only
PROBS_ARRAY_LEN
to sidestep the requirement forgeneric_const_exprs
. Instead,NUM_BITS
is an associated constant that is calculated asfloor(log_2(PROBS_ARRAY_LEN))
usingPROBS_ARRAY_LEN.trailing_zeros()
which has beenconst
since Rust 1.32. The const assert instead checks thatPROBS_ARRAY_LEN == 1 << (PROBS_ARRAY_LEN.trailing_zeros())
. Thus, rather thanBitTree
being valid for anyNUM_BITS
andPROBS_ARRAY_LEN == 1 << NUM_BITS
,BitTree
in this API is valid for anyPROBS_ARRAY_LEN
such thatPROBS_ARRAY_LEN == 2 ** floor(log_2(PROBS_ARRAY_LEN))
, which gives us effectively the same guarantees as the first signature without the redundantNUM_BITS
argument.Both signatures compile to the same code and have the same performance. It is a matter of preference and readability on which to take. I have a slight preference for the second signature because it is less noisy and seeing
BitTree<{1 << 8}>
for example seems fairly obvious, but purely looking at the implementation, I prefer the less 'clever' first signature.This PR bumps the MSRV to 1.57 for const generics and const_panic.
Benchmarks
In general, variances are reduced and there is a speed bump when accessing the dictionary during decompression. While I did apply const-generics to
encode::rangecoder::BitTree
for consistency, they don't seem to be currently used for compression so compression benchmarks should not be heavily affected if at all.Without const generics
With const generic BitTree
This should fix #19, after this PR,
DecoderState
only has the single heap allocation ofliteral_probs
which is substantially more difficult to put on the stack.Testing Strategy