Open apraga opened 7 years ago
FFTW_MEASURE does not guarantee bit reproducibility
Thank you for your quick answer. In my case, the relative error can climb to 290% (!) between the parallel and sequential version, even with FFTW_ESTIMATE. It would be surprising the difference came from bit reproducibility in my opinion.
What do you mean by "relative error"?
You cannot compute the relative errors of individual output elements. The error bound of FFT algorithms is only in the total norm of the output, something like ||OUT1-OUT0||/||OUT0|| < ERROR_BOUND, where ||.|| is the L2 norm (square root of the sum of the squares). There is no bound on ||OUT1[i]-OUT0[i]||/||OUT0[i]||.
To expand on Matteo's comment, suppose that the exact output of the FFT is supposed to be (1,0,0)
. Because of roundoff errors, however, we might get (1-2e-15,1e-15,-3e-15)
. Another FFT algorithm, e.g. a different plan or a parallel plan, might get (1+1e-15,2e-15,1e-15)
. If you compare individual elements, e.g. -3e-15
to 1e-15
, you might conclude that there is a huge error (factor of 3). But if you look at the root-mean-square (L2 norm) difference, then in both cases it is quite small compared to the root-mean-square of the expected vector (1,0,0).
You're totally right, sorry about the silly mistake.
However, computing the L2 norm sometimes results in very large norm. For example, with an array of 9x9x9, the L2 norm can be 3000. Maybe I'm still missing something... I've included the updated reproducible example here: fftw3_fortran.txt.
Your program works for me. What exactly did you do and what did you expect to happen?
I expect the program to return a L2 norm between the sequential and parallel version less than 1e-11 (computated are in double precision). In the actual code, thousands of FFT are computed. To replicate this situation, I used the small example above to ensure the parallel version always give the appropriate result. So i ran the program a thousand times (thus computing the same DFT a thousand times) with:
for i in `seq 1 1000`; do OMP_NUM_THREADS=4 ./fftw3_fortran; if [ $? -ne 0 ]; then print "error"; break; fi; done
In these runs, there is at last one occurence of a L2 norm on the order of hundreds or even thousands. I have tested it on Ubuntu (8 threads, 1 per core) and Archlinux (4 cores, 1 thread per core).
Hi, I'm bumping this issue a bit as I've found no solution. At the moment, I'm not sure FFTW is "safe" to use with OpenMP as this tests gives different results in sequential and in parallel. It is most likely an issue with the test but I don't see it. @matteo-frigo What did you mean by "it works for you" ? Thanks.
Hi,
Disclaimer: I've already sent a mail at fftw (at) fftw.org , but I figured it would be more useful here. Also, I've modified a bit the code (the plan is now private). So I apologize for double posting.
I'm comparing the real DFT of a 3D array in Fortran, in OpenMP and sequential. In some cases (1 over 1000), the results are different. Below is the parallel version :
I'm using FFTW 3.3.6 compiled with --enable-openmp and the program is compiled and linked with:
Attached is the whole source code for a reproductible example (rename as .txt) comparing the sequential and parallel version. fftw3_fortran.txt. The tests are run 1000 times with the bash loop:
Thanks for your help !