The C implementation could be improved considerably if the compiler was able to inline the complex_* functions into the loops. Could the implementations in common/complex_simple.c be moved to fft.c? The inlining, as well as removing the calls, enables vectorization too (at least by clang) and I see a 17% speedup on my Ryzen box. I'd like to make use of this suite for benchmarking WASM, with particular interest in autovec SIMD code, so it would be great if the suite was more friendly to traditional compiler optimisations that aid this.
The C implementation could be improved considerably if the compiler was able to inline the complex_* functions into the loops. Could the implementations in common/complex_simple.c be moved to fft.c? The inlining, as well as removing the calls, enables vectorization too (at least by clang) and I see a 17% speedup on my Ryzen box. I'd like to make use of this suite for benchmarking WASM, with particular interest in autovec SIMD code, so it would be great if the suite was more friendly to traditional compiler optimisations that aid this.