aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
866 stars 162 forks source link

NEON64: enc: add inline asm codepath #92

Closed aklomp closed 2 years ago

aklomp commented 2 years ago

Like was done in #91 for NEON32, we can implement the inner encoding loop for the NEON64 encoder in inline assembly. This should guarantee that we get the assembly code that we want/expect. The inner encoding loop is quite simple, so there is no large cost to adding a second parallel implementation.

aklomp commented 2 years ago

I don't have access to any AArch64 hardware, so I cannot verify that this code increases performance, or by how much exactly. Based on the results of #91 with the NEON32 codec, I'm confident in pushing the change though.

The CI does build and test AArch64 on a virtual machine, so I was able to functionally test the code before pushing it.