The FFT version of the matrix multiplier needs to be implemented for CUDA also.
Steps:
-Implement FFT and IFFT on CUDA (in C)
-Define an interface for passing complex (32 bit precision) arrays between FORTRAN and C (CUDA).
-Verify against the solution implemented for the CPU in Fortran.
Math:
We can use FFT to remove a portion of the matrix entries in Fourier-space, which can lead to a better performance:
H = K M =>
H = IFFT( FFT(K) FFT(M) ),
with the understanding that FFT(K) needs to be done once and then can be made into a sparse matrix by removing values at a certain level. At each computational step we then need to Fourier-transform M, do the matrix-multiplication in Fourier-space and then do the inverse Fourier transform to get the result. The intention is that in Fourier-space the K matrix is close to being diagonal thus high efficiency can be gained.
The FFT version of the matrix multiplier needs to be implemented for CUDA also. Steps: -Implement FFT and IFFT on CUDA (in C) -Define an interface for passing complex (32 bit precision) arrays between FORTRAN and C (CUDA). -Verify against the solution implemented for the CPU in Fortran.
Math: We can use FFT to remove a portion of the matrix entries in Fourier-space, which can lead to a better performance: H = K M => H = IFFT( FFT(K) FFT(M) ), with the understanding that FFT(K) needs to be done once and then can be made into a sparse matrix by removing values at a certain level. At each computational step we then need to Fourier-transform M, do the matrix-multiplication in Fourier-space and then do the inverse Fourier transform to get the result. The intention is that in Fourier-space the K matrix is close to being diagonal thus high efficiency can be gained.