The transformation of encoder kernels to inline functions (#58) allows us to move the inner encoding loop into separate inline functions.
Because the number of remaining loop iterations is known, we can split calls to the inner loop into long unrolled stretches. Tests show that this can result in a significant speedup.
The transformation of encoder kernels to inline functions (#58) allows us to move the inner encoding loop into separate inline functions.
Because the number of remaining loop iterations is known, we can split calls to the inner loop into long unrolled stretches. Tests show that this can result in a significant speedup.