FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.67k stars 652 forks source link

How to make openmp work for FFTW3 #282

Closed fishjojo closed 2 years ago

fishjojo commented 2 years ago

Dear developers, I have a code as follows

void fft(complex double* in, complex double* out, int* mesh, int rank)
{
    int info = fftw_init_threads();
    if (info == 0) {
        printf("Error initializing threads for FFTW3.\n");
    }
    int nthreads = 8, nthreads_used;
    fftw_plan_with_nthreads(nthreads);
    nthreads_used = fftw_planner_nthreads();
    printf("Requesting %d threads for FFTW3, %d used\n", nthreads, nthreads_used);
    fftw_plan p = fftw_plan_dft(rank, mesh, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_execute(p);
    fftw_destroy_plan(p);
    fftw_cleanup_threads();
}

which I expect to run with 8 threads but only runs with 1 thread. I can confirm that openmp is working as other parts of the program can make use of multiple threads. The fftw3 was compiled with gcc 9. Thank you in advance.

stevengj commented 2 years ago

Did you compile FFTW with --enable-openmp? In that case the number of actual CPU threads launched is controlled by OpenMP, and you typically need to set the OMP_NUM_THREADS environment variable to tell OpenMP how many threads you want.

Note that OMP_NUM_THREADS is in in addition to fftw_plan_with_nthreads, because you can have FFTW break up the computation into more or fewer parallelizable sections than OpenMP has threads for.

fishjojo commented 2 years ago

Yes, I complied fftw with --enable-openmp, and the program was linked with -lfftw3-omp -lfftw3 -lm -lpthread. And I also set OMP_NUM_THREADS to the number of CPUs requested. In my case, omp_get_num_threads() will return 1 if it is called outside the omp parallel region. I guess fftw will also only find 1 thread? I also tried to initialize fftw plan inside the omp parallel region and to execute the plan outside, but still only one thread was used. I don't know what's the right way to let fftw spawn multiple threads. Are there any working examples about using fftw with openmp? Thanks you.

stevengj commented 2 years ago

In my case, omp_get_num_threads() will return 1 if it is called outside the omp parallel region. I guess fftw will also only find 1 thread?

That probably means you didn't set OMP_NUM_THREADS correctly.