gnudatalanguage / gdl

GDL - GNU Data Language
GNU General Public License v2.0
276 stars 61 forks source link

fft very slow for special array sizes #1892

Open brandy125 opened 6 days ago

brandy125 commented 6 days ago

The fft for an complex array of size (400,16087) takes 97 seconds while a slightly bigger array of size (400,16095) takes only 0.3 seconds

Can anybody explain or improve that?

GDL - GNU Data Language, Version v1.0.6-25-g67b3abca

GDL> a=complexarr(400,16040) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.79654002 seconds

GDL> a=complexarr(400,16050) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.33697796 seconds

GDL> a=complexarr(400,16060) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.30311394 seconds

GDL> a=complexarr(400,16070) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        3.2735980 seconds

GDL> a=complexarr(400,16080) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.27597404 seconds

GDL> a=complexarr(400,16087) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        97.450693 seconds

GDL> a=complexarr(400,16090) & tic &  b=fft(a,dim=2) & toc

Time elapsed:        3.2825370 seconds

GDL> a=complexarr(400,16095) & tic &  b=fft(a,dim=2) & toc

Time elapsed:       0.27009702 seconds
alaingdl commented 6 days ago

Can anybody explain or improve that?

explain ? yes ! improve ? no !

for me the situation is limpid : 16087 is a "large" prime number, then it cannot be splitted into a product of small prime numbers 2$^n$ 3$^m$ ...

you can read in the doc of FFTw Besides the automatic performance adaptation performed by the planner, it is also possible for advanced users to customize FFTW for their special needs. As distributed, FFTW works most efficiently for arrays whose size can be factored into small primes (2, 3, 5, and 7), and uses a slower general-purpose routine for other factors.

In the IDL doc. this point is also discussed (decomposition in prime factors versus speed)

Years ago, processing very long TOI (time series) for Planck HFI, we decided to cut into the longest 2$^n$ 3$^m$ 5$^o$ just below what we want to process ... I may still have a code around ... Maybe in the idlastro too !

brandy125 commented 6 days ago

understand

Shouldn't we try to catch that up? If we know that prime numbers will take 300 times longer in that case than the next round number, isn't there any way to do the fft on bigger arrays and transfer the result back to the shorter array?

alaingdl commented 6 days ago

I would be reluctant to depart for what the reference is providing to us ! No change by default to FFT() for me !

In order to make a warning, I don't know if it is easy to know whether a number is prime or not, or provide a suggestion of adequate numbers. We may consider to add an extra procedure to check that before. In some part of the TOI processing for Planck HFI it was done "automatically". It was the "responsibility" of the team to manage that and provide clear information to third users (take care of side effect, aliasing, ...)

Concerning warnings, we tried to add some in some places in GDL but it was not good idea : in case of pipeline or multiple calls, too much messages.

During long time we (wrongly) repeated that FFT is fast only for power of two and most of my colleagues never realized that a combination of 2, 3, 5 ... till 11 is fine. (clearly written in some docs !) We have to let know to students & news users they have to consider if they are in a good situation or not !

Last point : it is really super easy to superseded a function in GDL, without loose of performance you can call FFT() through something like MYFFT() or FFT2() and put the checks in that code, without changing the c++ code of FFT

brandy125 commented 5 days ago

Understand. On the other hand how can IDL do it so fast?

bash$ fl

Fawlty Language 0.79.54 (linux aarch64 m64) Copyright (c) 2000-2024, Lajos Foldy

                         https://www.flxpert.hu/fl/

*** THIS SOFTWARE COMES WITH ABSOLUTELY NO WARRANTY! USE AT YOUR OWN RISK! ***

Type 'help, /lib' for system routines info.

FL> a=complexarr(400,16087)
FL> b=fft(a,dim=2)
FL> tic & b=fft(a,dim=2) & toc
% Time elapsed: 0.328576 seconds.
FL> 
GillesDuvert commented 5 days ago

Ah, I'm afraid that DIM=2 is the problem. The FFTs in GDL use fftw only when DIM is not present. For some reason the use of fftw was introduced only in the basic case. With DIM=xxx it reverts to and old slow fourier transform code.

GillesDuvert commented 5 days ago
GDL> a=complexarr(400,16087)
GDL> tic & b=fft(a) & toc
% Time elapsed: 0.34985018 seconds.