Open nschaeff opened 3 days ago
I suspect it happens in the situation where all the following three conditions are realized together: padding + multiple uploads + R2C
Hello,
Yes, there is an issue with multiple uploads even R2C behaving incorrectly with zeropadding when using an even decomposition algorithm that represents R2C/C2R as a C2C of half-length with some post-processing. It has issues with odd zeropadding sizes as it reads data as complex values. There is another algorithm implemented that does real transforms as internal callbacks that I enabled to substitute the aforementioned even R2C/C2R algorithm if zeropadding is requested. It uses 2x more temporary memory in multiple uploads, but the size of memory transferred and performance should be similar to the even R2C algorithm.
As for the fft_zeropad_right[0] - the complex output of R2C or input of C2R are assumed to be conjugate, so it should be <= nfft/2+1, all values above will have the same result as nfft/2+1. If chosen < nfft/2+1, the system will have this format: [1,1, 0 ... 0, 1,1] for [0, nfft/2+1) elements, and then the complex conjugate will be read in reverse as: [1, 0 ... 0, 1,1].
Best regards, Dmitrii
In some cases, for large sizes (more below), using a R2C and C2R transform, I obtain wrong FFT results with the zeroPadding feature in frequency domain. If I remove the zeroPadding feature, everything works well. Maybe I'm not doing things correctly, the padding for R2C format is not clearly described.
On an AMD MI210, it fails for nfft > 4096 (not all values, though). nfft=6144 fails, for instance. On an nvidia A30, it fails for larger values, nfft >= 12288 (some smaller values fail too, like 11264).
Steps to reproduce:
Performing FFT in both direction gives wrong results.