fft very slow for special array sizes

brandy125 commented 1 month ago

The fft for an complex array of size (400,16087) takes 97 seconds while a slightly bigger array of size (400,16095) takes only 0.3 seconds

Can anybody explain or improve that?

GDL - GNU Data Language, Version v1.0.6-25-g67b3abca

GDL> a=complexarr(400,16040) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.79654002 seconds

GDL> a=complexarr(400,16050) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.33697796 seconds

GDL> a=complexarr(400,16060) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.30311394 seconds

GDL> a=complexarr(400,16070) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        3.2735980 seconds

GDL> a=complexarr(400,16080) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.27597404 seconds

GDL> a=complexarr(400,16087) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        97.450693 seconds

GDL> a=complexarr(400,16090) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        3.2825370 seconds

GDL> a=complexarr(400,16095) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.27009702 seconds

alaingdl commented 1 month ago

Can anybody explain or improve that?

explain ? yes ! improve ? no !

for me the situation is limpid : 16087 is a "large" prime number, then it cannot be splitted into a product of small prime numbers 2$^n$ 3$^m$ ...

you can read in the doc of FFTw Besides the automatic performance adaptation performed by the planner, it is also possible for advanced users to customize FFTW for their special needs. As distributed, FFTW works most efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and uses a slower general-purpose routine for other factors.

In the IDL doc. this point is also discussed (decomposition in prime factors versus speed)

Years ago, processing very long TOI (time series) for Planck HFI, we decided to cut into the longest 2$^n$ 3$^m$ 5$^o$ just below what we want to process ... I may still have a code around ... Maybe in the idlastro too !

brandy125 commented 1 month ago

understand

Shouldn't we try to catch that up? If we know that prime numbers will take 300 times longer in that case than the next round number, isn't there any way to do the fft on bigger arrays and transfer the result back to the shorter array?

alaingdl commented 1 month ago

I would be reluctant to depart for what the reference is providing to us ! No change by default to FFT() for me !

In order to make a warning, I don't know if it is easy to know whether a number is prime or not, or provide a suggestion of adequate numbers. We may consider to add an extra procedure to check that before. In some part of the TOI processing for Planck HFI it was done "automatically". It was the "responsibility" of the team to manage that and provide clear information to third users (take care of side effect, aliasing, ...)

Concerning warnings, we tried to add some in some places in GDL but it was not good idea : in case of pipeline or multiple calls, too much messages.

During long time we (wrongly) repeated that FFT is fast only for power of two and most of my colleagues never realized that a combination of 2, 3, 5 ... till 11 is fine. (clearly written in some docs !) We have to let know to students & news users they have to consider if they are in a good situation or not !

Last point : it is really super easy to superseded a function in GDL, without loose of performance you can call FFT() through something like MYFFT() or FFT2() and put the checks in that code, without changing the c++ code of FFT

brandy125 commented 1 month ago

Understand. On the other hand how can IDL do it so fast?

bash$ fl

Fawlty Language 0.79.54 (linux aarch64 m64) Copyright (c) 2000-2024, Lajos Foldy

                         https://www.flxpert.hu/fl/

*** THIS SOFTWARE COMES WITH ABSOLUTELY NO WARRANTY! USE AT YOUR OWN RISK! ***

Type 'help, /lib' for system routines info.

FL> a=complexarr(400,16087)
FL> b=fft(a,dim=2)
FL> tic & b=fft(a,dim=2) & toc
% Time elapsed: 0.328576 seconds.
FL>

GillesDuvert commented 1 month ago

Ah, I'm afraid that DIM=2 is the problem. The FFTs in GDL use fftw only when DIM is not present. For some reason the use of fftw was introduced only in the basic case. With DIM=xxx it reverts to and old slow fourier transform code.

GillesDuvert commented 1 month ago

GDL> a=complexarr(400,16087)
GDL> tic & b=fft(a) & toc
% Time elapsed: 0.34985018 seconds.

GillesDuvert commented 3 weeks ago

In retrospect, I'm surprised nobody ever complained --- probably people do 'whole' FFTs, not subsets. Or on too small arays to make a difference. In retrospect too, FFTW did not provide directly that functionality at the time (FFT of an n-dim array's slice). It does now, but the code is awkward and as undocumented as can be written by the guy who did it. I look at it.

BTW, the slow code uses the GSL fft and I'm surprised the GSL can be so slow---probably our fault.

GillesDuvert commented 3 weeks ago

GSL was not the problem here.

GillesDuvert commented 3 weeks ago

GDL> a=complexarr(400,16087) &tic & b=fft(a,dim=2) & toc % Time elapsed: 0.37771702 seconds. PR coming.

GillesDuvert commented 3 weeks ago

should work with #1897

gnudatalanguage / gdl

fft very slow for special array sizes #1892