Hi-PACE / hipace

Highly efficient Plasma Accelerator Emulation, quasistatic particle-in-cell code
https://hipace.readthedocs.io
Other
51 stars 14 forks source link

Simplify tiling in plasma deposition #1093

Closed AlexanderSinn closed 7 months ago

AlexanderSinn commented 7 months ago

In this PR, the temp density arrays that were used for the plasma current deposition were removed. Instead, thread safety is ensured by splitting the tiles into four groups such that tiles within a group don’t overlap. Shown below is the chi array after the first group of tiles was deposited. This cleans up the code, as the temp densities don’t have to be allocated and managed anymore, and gives a small performance improvement because the lockAdd to the main array is not necessary anymore.

    for (int tile_perm_x=0; tile_perm_x<2; ++tile_perm_x) {
    for (int tile_perm_y=0; tile_perm_y<2; ++tile_perm_y) {
#pragma omp parallel for collapse(2) if(do_tiling)
    for (int itilex=tile_perm_x; itilex<ntilex; itilex+=2) {
    for (int itiley=tile_perm_y; itiley<ntiley; itiley+=2) {

    // the index is transposed to be the same as in amrex::DenseBins::build
    const int tile_index = itilex * ntiley + itiley;

    // Deposit one tile at tile_index 

    }}}}

image

Performance for a 2047*2047*300 grid, exactly one tile per thread on dual 48 core CPUs:

perf_no_add