DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.52k stars 91 forks source link

R2C roundtrip failed with HIP backend in `develop` #112

Closed al42and closed 1 year ago

al42and commented 1 year ago

Doing a roundtrip for a 5x5x10 3D R2C transform, the last XY-row is not reconstructed correctly:

$ hipcc -Wno-switch --offload-arch=gfx906 -I$HOME/VkFFT/vkFFT/ -DVKFFT_BACKEND=2 vkfft-5x5x10.cpp -o vkfft-5x5x10 -O1 && ./vkfft-5x5x10
Fail at index {24, 0}: got -4.89702, expected -4.9
Fail at index {24, 2}: got 7.70298, expected 7.7
Fail at index {24, 4}: got -9.69702, expected -9.7
Fail at index {24, 6}: got 8.10298, expected 8.1
Fail at index {24, 8}: got 3.40298, expected 3.4

Other values are within 1e-5 abs. error.

Code: https://gist.github.com/al42and/c5b1cf3afe261585102971579c851e42

Tested with: ROCm 5.3.3 on MI250X (gfx90a), ROCm 5.4.1 on MI50 (gfx906), ROCm 5.4.2 on RX 6400 (gfx1034).

The same code work well when using master (13005671b20956983128003d3747b0529f4ded9a) version of VkFFT.

Bisection leads to 8ec6867504f1e9a3e87db58f2d0c6bc512ad11fc.

DTolm commented 1 year ago

Hello!

Thanks for pointing this out. I was exploring the addressing optimisations that could be made for various compilers and missed up on the case of R2C/C2R, where it was not applicable. Should be fixed now.

Best regards, Dmitrii

al42and commented 1 year ago

Thanks for the quick fix!