Decoders: unroll inner loops

aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration

BSD 2-Clause "Simplified" License

866 stars 162 forks source link

Decoders: unroll inner loops #62

Closed aklomp closed 4 years ago

aklomp commented 4 years ago

The transformation of decoder kernels to inline functions (#59) allows us to move the inner decoding loop into separate inline functions.

Because the number of remaining loop iterations is known, we can split calls to the inner loop into long unrolled stretches. Tests show that this can result in a significant speedup.

aklomp commented 4 years ago

It seems like unrolling the loops on NEON results in a significant slowdown rather than a speedup, so maybe this branch should be held back until further efficiency improvements are made in the NEON decoders.