flintlib / flint

FLINT (Fast Library for Number Theory)
http://www.flintlib.org
GNU Lesser General Public License v3.0
446 stars 245 forks source link

Generic FFT #1418

Open fredrik-johansson opened 1 year ago

fredrik-johansson commented 1 year ago

I would like to have optimized FFT algorithms for gr vectors. Wishlist:

A good start could be to translate acb_dft to the gr format (after which acb_dft could be converted to a thin wrapper around the gr version). This doesn't have all the bells and whistles listed above, but the architecture looks reasonable as a basis for adding more features.

On the other hand, here is what fft_small does:

On the other hand, it is notably missing multithreading (multiplications are parallelized by assigning separate FFTs to separate threads), and it is not clear to me from reading the code whether it's using the matrix trick for huge transforms.

Complex as the fft_small code is, it might be easier to reverse-engineer it into more general gr form than reimplementing all the same optimizations from scratch for gr data.

Ideally, a completely generic FFT on top of a dmod ring with vector butterfly overloads could be made efficient enough to subsume fft_small (and even improve it further). I don't know if this is the case, but it would be interesting to try. Even if fft_small can't be matched, it would be nice to have fallback FFT code that works for full 64-bit moduli and without AVX2 support, even if it's a good factor slower than Dan's code.

Anyway, even something pretty naive would work well enough for arbitrary-precision rings.

fredrik-johansson commented 1 year ago

Note: fft_small is based on the much more readable

https://github.com/flintlib/flint2/blob/trunk/src/fft_small/doc/fft.jl