Closed WeiqunZhang closed 3 weeks ago
Notes on the implementation of cosine and sine transform are available at https://www.overleaf.com/read/krjbcfhfgvmj#f7c9e1.
I once made an optimized FFT Poisson solver for GPU in HiPACE++: https://github.com/Hi-PACE/hipace/blob/development/src/fields/fft_poisson_solver/FFTPoissonSolverDirichletFast.cpp. It does a single-rank 2D DST-I using the Fast Sine Transform algorithm from page 238 of Computational Frameworks for the Fast Fourier Transform by Charles Van Loan. This does not require expanding the domain by 2x or 4x like it is currently done in this PR for the R2R FFTs. The following Pages have similar algorithms for DST-II, DST-III, DCT-II and DCT-III.
I also found that it was better to implement the R2R FFT directly in the Poisson solver instead of in the FFT wrapper so that the pre- and post-processing GPU kernels can be combined with the transposes (here ParallelCopy).
That's good know. What we need is batched 1D DST and DCT. I guess that might be even easier than the 2D FFT you have implemented.
Ready for review, but let's not merge it until after the monthly release.
Add support for Neumann and Dirichlet boundaries in the FFT based Poisson solver. This requires cosine and sine transforms. For CPU builds, we use FFTW for these transforms. For GPU builds, we have implemented cosine and sine transforms using the real-to-complex transform provided by cuFFT, rocFFT and oneMKL.