aklomp / base64

Fast Base64 stream encoder/decoder in C99, with SIMD acceleration
BSD 2-Clause "Simplified" License
865 stars 162 forks source link

NEON64: enc: convert full encoding loop to inline assembly #98

Closed aklomp closed 2 years ago

aklomp commented 2 years ago

Convert the full encoding loop to an inline assembly implementation for compilers that can use inline assembly.

The motivation for this change is issue #96: when optimization is turned off on recent versions of clang, the encoding table is sometimes not loaded into sequential registers. This happens despite taking pains to ensure that the compiler uses an explicit set of registers for the load (v8-v11).

This leaves us with not much options beside rewriting the full encoding loop in inline assembly. Only that way can we be absolutely certain that the correct registers are used. Thankfully, aarch64 assembly is not very difficult to write by hand.

Fixes #96. Closes #97.