ejmahler / RustFFT

RustFFT is a high-performance FFT library written in pure Rust.
Apache License 2.0
678 stars 47 forks source link

Unify the f32 and f64 code paths for SIMD radix4 #130

Closed ejmahler closed 7 months ago

ejmahler commented 7 months ago

This PR de-monomorphizes the code for the SSE, NEON, and Wasm Simd Radix4 implementations, replacing them with a single generic implementation. It follows a similar technique to the AVX generic algorithms, where we define a "SseVector", "NeonVector", etc trait that has generic methods for loading/storing, and only call those methods from the radix4 implementation.

The SseVector, NeonVector etc traits are much simpler than their AVX counterparts, both because we don't have half vs full vectors to worry about, and because I only included the absolute minimum in the trait to get radix4 to work.

One interesting thing to note is that because the radix4 structs can't refer to f32 vs f64 in their constructors, I had to move the construction of butterflies out of the structs and into the planners. The Radix4 structs now take a base FFT and a parameter k, and compute their length to be base_fft.len() * 4^k. This refactor actually feels very natural, and I wish we had done it before now.

Another thing I noticed: The base FFT length for radix4 does not need to be a power of 2. The radix4 struct is one of the oldest in the project, and I've been taking a "if it works don't fix it" approach for very long, and therefore haven't been questioning some of its very outdated design decisions. This opens up some cool optimization opportunities - like, I noticed that the planner has loogic for checking the size when planning bluestein's, and if the size is over a certain threhold it tries to switch to a mixed radix with a factor of 3. Instead, we could introduce a factor of 3 by continuing to use Radix4, but switching from a base FFT of 16 to a base FFT of 12, or from 32 to 24, and we could be certain that this would be pretty much unconditionally faster. There's a lot more we can do, and I intend to explore this in the next few days.

One final thing I noticed: The SseVector, NeonVector, etc traits are completely identical. We could probably get some benefit out of factoring those out into a target-agnostic simd utils file. If any SIMD path needs functionality not provided by the target-agnostic trait, they can just declare their own child trait with the functionality they need.

ejmahler commented 7 months ago

One final thing I noticed: The SseVector, NeonVector, etc traits are completely identical. We could probably get some benefit out of factoring those out into a target-agnostic simd utils file. If any SIMD path needs functionality not provided by the target-agnostic trait, they can just declare their own child trait with the functionality they need.

I tried this and it very much does not work, because in the generic SimdNum trait, you can only put SimdVector as the trait bound for the VectorType associated type. In child traits, you can't refine that trait bound (to my knowledge), so you're stuck with the generic-most set of functionality for everything, or defining everything you could possibly do like AVX does.

HEnquist commented 7 months ago

The base FFT length for radix4 does not need to be a power of 2.

Wow that's actually really cool. I have never seen anything else than powers of two used there so I just assumed that's how it is.