Closed t1nux closed 8 months ago
Hi @t1nux , thanks for the bug report. Thanks especially for providing a standalone test program, which is very helpful.
Your test procedure appears to be going through a plotting step and comparing the plots for accuracy. But are you able to instead compare data directly in your reproducer? That would help confirm that the problem is indeed in the rocFFT output as opposed to at some later step.
I've modified your reproducer to compare the z3d
and z21d
output directly: https://github.com/evetsso/roc_fft_bug/commit/be19acf2afd6666a0beea72d17b17b0fa8dbfe56
This compares the two results in a similar way to to what our rocfft-test
accuracy test does when we check against FFTW as a reference implementation. I'm computing and printing the L2 and L-infinity norms of the difference between the two buffers. I've also removed the rocThrust dependency for simplicity, since this small reproducer does not really need it.
My observed results:
l2 difference: 6.952651e-12
l-inf difference: 8.747490e-13
Which I think is a tolerable difference - the two FFTs are doing operations in a different order so we wouldn't expect exactly the same results between the two.
Can you confirm that you observe similar results when directly comparing the data in the reproducer?
Hi @evetsso, thank you for your reply and for the direct comparison code.
I should have been more clear when things actually fail, it was only written in the .ipynb notebook. Sorry for that! I will quickly mention a case that will show an issue also when I run your code. If you change int Nt = 1<<4;
to int Nt = 1<<8;
in line 55, it is a case where the 2D+1D FFT gets the correct result, but the 3D FFT does not. For that case, your code gives me:
268435456 values
4 GB
l2 difference: 8.384832e+01
l-inf difference: 1.597270e+00
Also, here is a quick list of what happens for rocfft in comparison to NumnPy's 3D FFT:
# not working for 3d FFT
Nt, Ny, Nx = 2**4, 2**9, 2**13
Nt, Ny, Nx = 2**4, 2**13, 2**9
Nt, Ny, Nx = 2**8, 2**8, 2**12
# not working for both 3d and 2D+1D
Nt, Ny, Nx = 2**8, 2**10, 2**10
Nt, Ny, Nx = 2**8, 2**12, 2**8
# working
Nt, Ny, Nx = 2**4, 2**9, 2**9
Nt, Ny, Nx = 2**4, 2**9, 2**10
Nt, Ny, Nx = 2**4, 2**9, 2**11
Nt, Ny, Nx = 2**4, 2**9, 2**12
Nt, Ny, Nx = 2**4, 2**12, 2**9
Nt, Ny, Nx = 2**4, 2**10, 2**10
Nt, Ny, Nx = 2**8, 2**9, 2**9
Nt, Ny, Nx = 2**8, 2**9, 2**10
Nt, Ny, Nx = 2**8, 2**8, 2**11
Nt, Ny, Nx = 2**4, 2**12, 2**8
I do not think this list is exhaustive. Also, in case of both not working, we have to compare with FFTW, NumPy, or something else. That's what I will do now, write something that directly compares with FFTW. I should have done this in the first place, but I'm really quite slow in C, so don't hold your breath. I'll be back...
EDIT: Corrected typos.
OK, as promised, here is the direct comparison with FFTW. It should only be necessary to change Nt
, Ny
, and Nx
to observe the issue.
Nt = 128, Ny = 4096, Nx = 256
--- 3D ---
l2 difference: 4.078788e-11
l-inf difference: 1.477354e-11
--- 2D+1D ---
l2 difference: 3.908065e-11
l-inf difference: 1.613426e-11
Nt = 256, Ny = 256, Nx = 4096
--- 3D ---
l2 difference: 8.669607e+04
l-inf difference: 2.701379e+04
--- 2D+1D ---
l2 difference: 5.941162e-11
l-inf difference: 2.961650e-11
Nt = 256, Ny = 4096, Nx = 256
--- 3D ---
l2 difference: 8.669607e+04
l-inf difference: 2.701379e+04
--- 2D+1D ---
l2 difference: 8.669857e+04
l-inf difference: 2.701220e+04
Please also look at the table in my previous post with more failure examples. I am sure there are different cases, too.
@evetsso Thanks for the code to compare the arrays.
EDIT: I tested those 3 cases above on the systems with the RX 6900 XT and the Radeon VII Pro, with the same behavior. I don't have access to the system with the RX 7900 XTX from home, but I'll test tomorrow.
EDIT2: Same on the RX 7900 XTX.
For the case Nt = 2^8
, Ny = 2^12
, Nx = 2^8
, I've added a plot to illustrate what is happening qualitatively.
What you see in the image below is the following:
f = 0
, i.e. a kx-ky
mapky = 0
, i.e. an kx-f
mapkx = 0
, i.e. an ky-f
mapSince I'm starting with a 3D step function, I should get a 3D sinc (sin(x)/x) in the FFT, and the images should show 2D sinc functions. This seems correct in the case of FFTW (bottom row). Because of the large Ny
, the left and right images in the bottom row look weird, but they're fine when zooming in.
Ok, I see what's going on. When you're computing the maximal work buffer size for all of the plans, you're casting the inputs to int
. But the number of bytes required is > 4 GiB, and the value is truncated before it reaches std::max
.
Redoing the size computation like this fixes the problem:
rocfft_plan_get_work_buffer_size(plan3d3d_ip_f, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
rocfft_plan_get_work_buffer_size(plan3d3d_ip_b, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
rocfft_plan_get_work_buffer_size(plan3d2d_ip_f, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
rocfft_plan_get_work_buffer_size(plan3d2d_ip_b, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
rocfft_plan_get_work_buffer_size(plan3d1d_ip_f, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
rocfft_plan_get_work_buffer_size(plan3d1d_ip_b, &work_buffer_size);
rocfft_buffer_size = std::max<size_t>(rocfft_buffer_size, work_buffer_size);
Note that rocfft_execute
is actually failing in your example, but your example code is not checking it.
I believe this resolves your problem, so I'm closing this issue. Please feel free to comment and/or reopen if you need anything else.
@evetsso Thank you so much! This indeed fixes the issue, and I see why this is failing now. Thank you for your explanations.
So, next to your changed code for rocfft_buffer_size
, I added
#define ROCFFT_ASSERT(x) (assert((x) == rocfft_status_success))
and I'm using it whenever I call rocfft_execute()
like this:
ROCFFT_ASSERT(rocfft_execute(plan3d3d_ip_f, (void**) &z_d, nullptr, rocfft_info));
My apologies for posting this as a bug while it was actually an error in my code!
Problem Description
Hello everyone
First of all, I am using Arch Linux, and I know it is not an officially supported distro. But I cannot just take down one of the 3 Arch system I tested this with. I have example code that can easily be executed, so please, do not simply discard this for that reason.
I am working with a potentially large complex 3D array (time t, space y, space x), and I need to apply different operations to it. These operations are mostly applied in
frequency / 2D reciprocal space
which means I need to transform accordingly, depending on which space I am currently in. Therefore, I use either a 3D, 2D, or 1D FFT to get to the right space.
As is turns out, the 3D FFT, but also a 2D+1D FFT, does not produce the correct result for all sizes of the complex 3D array.
Example code to demo the bug can be found here.
I have already posted this on community.amd.com, but there was no answer. There is more info though.
Next to the specified system, I also tried this on 2 other systems with a Radeon VII Pro and an RX 6900 XT, respectively. The bug was also reproducible there.
Please let me know if there is anything else you need. Thanks!
Operating System
Arch Linux
CPU
AMD Ryzen 5 3600X 6-Core Processor
GPU
AMD Radeon Pro VII, AMD Radeon RX 7900 XTX
ROCm Version
ROCm 6.0.0
ROCm Component
rocFFT
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
No response