aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
865 stars 162 forks source link

SSSE3: enc: add inline asm codepath #109

Closed aklomp closed 1 year ago

aklomp commented 1 year ago

After adding an inline assembly implementation of the AVX2 and AVX encoders in #104 and #108, it turns out that we can easily repeat the trick for SSSE3. The code looks a lot like the AVX implementation. Benchmarking across a few machines consistently shows around 10-20% speedup.

One caveat is that the inline assembly codepath will only be available on 64-bit machines. 32-bit machines with SSSE3 support (rare, but they exist, I own one) have eight XMM registers instead of sixteen, and that's not enough to implement a proper pipelined, unrolled loop.

I did try to write inline assembly that uses only eight XMM registers, but under those constraints I could not implement a parallelized loop, and in fact I could not even beat the compiler for speed.

aklomp commented 1 year ago

@htot Since you use the SSSE3 level stuff, I think this might be of interest to you?