ejmahler / RustFFT

RustFFT is a high-performance FFT library written in pure Rust.
Apache License 2.0
706 stars 49 forks source link

Consistently use FMA in neon butterfly3 #136

Closed ejmahler closed 9 months ago

ejmahler commented 9 months ago

Small PR that makes explicit use of FMA in butterfly3.

Mild performance gains - 5-10% for the smallest butterflies, and the gains are smaller the more non-butterfly3 work it's doing. Still a clear win.

It's clear that the compiler doesn't automate mul -> add into a FMA, and I notice that the neon prime butterflies don't make any explicit use of FMA, so we stand to gain a lot by rewriting the neon prime butterflies to explicitly use FMA. That's a bigger task to update the automated script though so it's not included here.

HEnquist commented 9 months ago

Nice!

It's clear that the compiler doesn't automate mul -> add into a FMA,

Yes compilers don't usually automatically use fma. That gives one less rounding so the result is a tiny bit more accurate, but it's different so not completely equivalent.