Hi-PACE / hipace

Highly efficient Plasma Accelerator Emulation, quasistatic particle-in-cell code
https://hipace.readthedocs.io
Other
51 stars 14 forks source link

Improve performance of FFTDirichletFast #1112

Closed AlexanderSinn closed 3 months ago

AlexanderSinn commented 4 months ago

This PR improves the performence of FFTDirichletFast by combining the GPU kernels between the FFTs into a single kernel. This reduces memory usage by getting rid of the temporary field and increases performance by reducing the memory bandwidth needed (to the temporary field) and reducing kernel launch overhead for small resolutions.

On MI250X with 4095^2 cells, this gives a speed-up of 40%.

MaxThevenet commented 3 months ago

Thanks for this PR! Could you add a short description?

AlexanderSinn commented 3 months ago

It is tested by the GPU CI. I also tested it in a previous commit in the CPU CI which passed (just the doc didn’t work).