KAdamek / GPU_Overlap-and-save_convolution

Shared memory overlap-and-save method for NVIDIA GPUs using CUDA
MIT License
16 stars 6 forks source link

cuFFT and k_customFFT_GPU_forward results not the same #1

Closed garko1 closed 4 years ago

garko1 commented 4 years ago

You can try with these 65 (causal) filter coefficients (in time domain, pad them with zeros to 4096 size): 0.000190, -0.000280, 0.000364, -0.000425, 0.000441, -0.000384, 0.000228, 0.000058, -0.000498, 0.001114, -0.001918, 0.002908, -0.004071, 0.005373, -0.006762, 0.008164, -0.009485, 0.010611, -0.011408, 0.011724, -0.011395, 0.010238, -0.008055, 0.004624, 0.000320, -0.007119, 0.016273, -0.028599, 0.045628, -0.070668, 0.112425, -0.203098, 0.633555, 0.633555, -0.203098, 0.112425, -0.070668, 0.045628, -0.028599, 0.016273, -0.007119, 0.000320, 0.004624, -0.008055, 0.010238, -0.011395, 0.011724, -0.011408, 0.010611, -0.009485, 0.008164, -0.006762, 0.005373, -0.004071, 0.002908, -0.001918, 0.001114, -0.000498, 0.000058, 0.000228, -0.000384, 0.000441, -0.000425, 0.000364, -0.000280

cuFFT and CT_DIF_FFT_4way<4096> do not yield the same results. I have not included code in macro #ifdef TESTING

KAdamek commented 4 years ago

Hi Garko, this is correct behaviour as the CT_DIF_FFT_4way does not reorder the elements after calculating Fourier transform which normally required when the Cooley-Tukey FFT algorithm is used. However, for the convolutions through frequency-domain (using convolutional theorem) we can leave out the reordering step as the wrong order after forward FFT will be cancelled out by the inverse FFT step (Forward decimation in frequency FFT element order is cancelled by inverse decimation in time FFT). When you define TESTING it enables reordering step for all FFT calculation which leads to degradation in performance which is the reason why we have left it out. Thanks for your interest in this code. Karel