ejmahler / RustFFT

RustFFT is a high-performance FFT library written in pure Rust.
Apache License 2.0
678 stars 47 forks source link

Power of 3 Speed up on neon #139

Open ianfrankS opened 4 months ago

ianfrankS commented 4 months ago

In most FFT libraries (Including RustFFT when using non-AVX code), power-of-two FFT sizes are the fastest, and users see a steep falloff in performance when using non-power-of-two sizes. Thankfully, RustFFT using AVX acceleration is not quite as restrictive:

Any FFT whose size is of the form 2^n * 3^m can be considered the “fastest” in RustFFT.

Is there anyway to have this kind of speed-up for neon? The powers of 2 are quite restrictive.

Thank you!

HEnquist commented 4 months ago

What lengths do you use? Have you measured the actual speed at those lengths compared to say the next larger power of two? I often use lengths that are 2^n * 3 (so with m=1) and those are nearly as fast as 2^n.

Implementing the 3^m bits for neon would of course be possible, but it's a substantial amount of work. Afaik everyone working on this library does it in their free time.

ejmahler commented 4 months ago

Non power of two FFT performance is absolutely one of my goals for this library.

In master (but not in any release yet), the scalar code path has a new struct called RadixN, which effectively makes power of 3, power of 5, and power of 7 first class citizens just like they are in the AVX code path.

It could be ported to the neon code path - that was my intent when I wrote it, and the only reason I haven't is that A: I'm not sure about some of the API decisions yet, and B: the free time issue already mentioned. That would give you the same 2^n*3^m optimal performance that AVX gets.

As @HEnquist said, the latest release does a pretty good job of handling any set of small mixed factors. It's worth giving your ideal size (especially if its 2^n3^m) a try and seeing if it meet your needs, even if it isn't the fastest* the library has to offer.

And if you're willing to use master instead of a release, some changes we've made result in 3*2^n being competitive with any power of two on all code paths, so you could try that.