FFTW / fftw3

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)
GNU General Public License v2.0
2.73k stars 665 forks source link

Performances issues with AVX512 on Intel Cascade Lake #220

Open RemiLacroix-IDRIS opened 3 years ago

RemiLacroix-IDRIS commented 3 years ago

Hello,

We are seeing some performance issues with AVX512 on your Intel Cascade Lake-based machine.

I am attaching a simplified test case to this issue: testfft.zip.

Compile with : ifort -O3 -qopenmp -I/.../fftw/include -L.../fftw/lib -lfftw3 -lfftw3_omp -o testfft testfft.f90.

The full logs are available for FFTW_ESTIMATE (res_estimate.txt) and FFTW_MEASURE (res_measure.txt).

There are two different issues:

The first issue might just be because of the hardware but the second one really feels like there is a bug.

Best regards, Rémi

RemiLacroix-IDRIS commented 3 years ago

Any ideas? It really feels like there must be a bug with fftw_plan_dft_r2c_3d (and maybe other fftw_plan_* functions) when using AVX512.

Lqlsoftware commented 3 years ago

Can you provide your code and detail information about your platform? I've done some work on porting fftw to fit RISC SIMD instructions, there is also a low performance when using multiple threads. Seems that the function of cost calculation sometime not working correctly when using SIMD instructions. But the false sharing of cache may also cause the problem.

RemiLacroix-IDRIS commented 3 years ago

Can you provide your code and detail information about your platform?

The code is attached to this issue. The platform has 2 Intel Cascade Lake 6248 (so 2x20 cores @ 2,5 GHz) per node.

Seems that the function of cost calculation sometime not working correctly when using SIMD instructions. But the false sharing of cache may also cause the problem.

The problem is mostly with fftw_plan_dft_r2c_3d and only happens with threads and AVX512 (AVX2 works fine).

Lqlsoftware commented 3 years ago

OK, I will see what I can do.