Hi-PACE / hipace

Highly efficient Plasma Accelerator Emulation, quasistatic particle-in-cell code
https://hipace.readthedocs.io
Other
51 stars 14 forks source link

Improve performance of FFTDirichletExpanded #1111

Closed AlexanderSinn closed 3 months ago

AlexanderSinn commented 4 months ago

This PR improves the performence of FFTDirichletExpanded by combining the GPU kernels between the FFTs into a single kernel. This reduces memory usage by getting rid of the temporary field and increases performance by reducing the memory bandwidth needed (to the temporary field) and reducing kernel launch overhead for small resolutions.

MaxThevenet commented 3 months ago

Thanks for this PR! Could you add a short description?