Open ianfrankS opened 7 months ago
What lengths do you use? Have you measured the actual speed at those lengths compared to say the next larger power of two? I often use lengths that are 2^n * 3 (so with m=1) and those are nearly as fast as 2^n.
Implementing the 3^m bits for neon would of course be possible, but it's a substantial amount of work. Afaik everyone working on this library does it in their free time.
Non power of two FFT performance is absolutely one of my goals for this library.
In master (but not in any release yet), the scalar code path has a new struct called RadixN, which effectively makes power of 3, power of 5, and power of 7 first class citizens just like they are in the AVX code path.
It could be ported to the neon code path - that was my intent when I wrote it, and the only reason I haven't is that A: I'm not sure about some of the API decisions yet, and B: the free time issue already mentioned. That would give you the same 2^n*3^m optimal performance that AVX gets.
As @HEnquist said, the latest release does a pretty good job of handling any set of small mixed factors. It's worth giving your ideal size (especially if its 2^n3^m) a try and seeing if it meet your needs, even if it isn't the fastest* the library has to offer.
And if you're willing to use master instead of a release, some changes we've made result in 3*2^n being competitive with any power of two on all code paths, so you could try that.
Is there anyway to have this kind of speed-up for neon? The powers of 2 are quite restrictive.
Thank you!