Closed bitbank2 closed 4 years ago
@bitbank2 hi please rebase to get the travis CI fixes :)
In general, I think that the state of the art of compilers has advanced a lot since src/assembly.h was written, and it doesn't hurt to check whether these fancy wrappers are still needed. It feels like assuming gcc or a compiler with optimization parity with gcc is not that outlandish.
MULSHIFT32 and MADD64 get sensible results when just coded in C, __builtin_clz uses the ARM clz instruction directly, but __builtin_abs creates a branching form.
Using a Programmer's Delight C implementation for FASTABS gives just 2 instructions, but they're both 32-bits in thumb mode:
int FASTABS1(int x) {
int y = (x >> 31);
return (x ^ y) - y;
}
gives
ea80 73e0 eor.w r3, r0, r0, asr #31
eba3 70e0 sub.w r0, r3, r0, asr #31
Most of the decode time is spent in the PolyphaseStereo() and PolyphaseMono() functions doing 64-bit integer math. The SIMD instructions of the Cortex-M4 take care of most of that, but the 64-bit shift right followed by clip to 16-bits had room for improvement. I added an inline asm function to shave off a few cycles.