Closed DragonDlut closed 2 weeks ago
You may also need to set the OMP_NUM_THREADS
environment variable to the desired number of threads before launching your program. And in general, you may want to experiment with the number of threads — you will get decreasing improvements if you try too many threads.
And you may want to use FFTW_MEASURE
rather than FFTW_ESTIMATE
if this is a performance-critical task that you are going to perform many times.
I would generally advise recompiling FFTW on the HPC. Even if it is the same architecture, you need to link to the OpenMP libraries on the HPC, which may have different versions etc.
Hi
According to my understanding, to use the OpenMP version of FFTW, the only difference is to add
call dfftw_init_threads(ierr) if(ierr==0) then write(,) "Error in Parallel FFT Initialization!" stop end if
nthreads=omp_get_max_threads() call dfftw_plan_with_nthreads(nthreads)
to the original code and the remaining
call dfftw_plan_dft2d(fft_plan_forward , fft_nx_extent,fft_ny_extent, fft_cval, fft_kval, FFTW_FORWARD , FFTW_ESTIMATE )
and
call dfftwexecute(fft_plan_forward)
is the same. Whether such correction to the code is right and enough to drive the parallel FFTW?
Another question, to use FFTW on HPC cluster, I have copied my desktop-compiled libfftw3.a and libfftw3_omp.a to the remote cluster and linked them with
gfortran -fopenmp FFT/libfftw3.a FFT/libfftw3_omp.a ....
Whether it is enough, or I must re-compile FFTW on the HPC?
Thank you for your help!
Longfei