The dec_reshuffle function now uses less instructions.
Speed-up on Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz using Apple LLVM version 8.0.0 (clang-800.0.38)
AVX2: +8.5% compared to previous version
SSSE3: +8% compared to previous version
Speed-up is much less than #16 but there's one significant enough I think.
Full results before & after modifications follows.
Before:
The dec_reshuffle function now uses less instructions. Speed-up on
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
usingApple LLVM version 8.0.0 (clang-800.0.38)
AVX2: +8.5% compared to previous version SSSE3: +8% compared to previous versionSpeed-up is much less than #16 but there's one significant enough I think.
Full results before & after modifications follows. Before:
After: