Improve safe decompression speed by 8-18%
Reduce multiple slice fetches. every slice access, also nested ones
,carries some overhead. In the hot loop a fixed &[u8;16] is fetched to
operate on. This is purely done to pass that info to the compiler.
Remove error handling that only carries overhead. As we are in safe
mode we can rely on bounds checks if custom error handling only adds overhead.
In normal operation no error should occur.
The strategy to identify improvements was by counting the lines of
assembly. A rough heuristic, but seems effective.
cargo asm --release --example decompress_block decompress_block::main |
wc -l