Hi-PACE / hipace

Highly efficient Plasma Accelerator Emulation, quasistatic particle-in-cell code
https://hipace.readthedocs.io
Other
54 stars 15 forks source link

Shared memory deposition #1158

Closed AlexanderSinn closed 1 month ago

AlexanderSinn commented 2 months ago

To enable shared memory deposition in GPU add:

hipace.do_shared_depos = true

The parameter plasmas.sort_bin_size was replaced with, as it now also affects beams.

hipace.tile_size = 32

MR with shared memroy:

Finished Evolve after 324.2 seconds using 1 rank
Total time per particle push: 1.544 nanoseconds (1.769 plasma, 12.1 beam)
Total time per cell update: 13.83 nanoseconds

TinyProfiler total time across processes [min...avg...max]: 324.3 ... 324.3 ... 324.3

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
------------------------------------------------------------------------------------------------------
ExplicitDeposition()                                    44800      69.86      69.86      69.86  21.54%
AdvancePlasmaParticles()                                44800      65.78      65.78      65.78  20.28%
DepositCurrent_PlasmaParticleContainer()                44804       44.4       44.4       44.4  13.69%
DepositCurrentSlice_BeamParticleContainer()             44800       36.1       36.1       36.1  11.13%
hpmg::MultiGrid::solve1()                               22400      30.59      30.59      30.59   9.43%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   5.26%
BeamParticleContainer::ReorderParticles()               11200      9.576      9.576      9.576   2.95%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.248      9.248      9.248   2.85%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.062      8.062      8.062   2.49%
ParticleContainer::SortParticlesForDeposition()          5600      8.012      8.012      8.012   2.47%
PermutationForDeposition()                              16800       7.55       7.55       7.55   2.33%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.11%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      2.479      2.479      2.479   0.76%
Fields::LevelUpBoundary()                              123200      1.986      1.986      1.986   0.61%
shiftSlippedParticles()                                 11200       1.47       1.47       1.47   0.45%
Fields::ShiftSlices()                                   22400      1.369      1.369      1.369   0.42%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.39%
Hipace::InitializeSxSyWithBeam()                        22400       1.14       1.14       1.14   0.35%
Hipace::SolveOneSlice()                                 11200     0.7064     0.7064     0.7064   0.22%
Hipace::Evolve()                                            1     0.6222     0.6222     0.6222   0.19%
Hipace::ExplicitMGSolveBxBy()                           22400     0.3252     0.3252     0.3252   0.10%
MultiBuffer::get_data()                                 11200    0.04189    0.04189    0.04189   0.01%
PlasmaParticleContainer::ReorderParticles()              5600    0.00954    0.00954    0.00954   0.00%
main()                                                      1   0.001976   0.001976   0.001976   0.00%
Other                                                  178374      3.083      3.083      3.083   0.95%
------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
------------------------------------------------------------------------------------------------------
main()                                                      1      324.3      324.3      324.3 100.00%
Hipace::Evolve()                                            1      324.2      324.2      324.2  99.97%
Hipace::SolveOneSlice()                                 11200      323.2      323.2      323.2  99.67%
ExplicitDeposition()                                    44800      69.86      69.86      69.86  21.54%
AdvancePlasmaParticles()                                44800      65.78      65.78      65.78  20.28%
DepositCurrent_PlasmaParticleContainer()                44804       44.4       44.4       44.4  13.69%
DepositCurrentSlice_BeamParticleContainer()             44800       36.1       36.1       36.1  11.13%
Hipace::ExplicitMGSolveBxBy()                           22400      32.08      32.08      32.08   9.89%
hpmg::MultiGrid::solve1()                               22400      30.59      30.59      30.59   9.43%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   5.26%
BeamParticleContainer::ReorderParticles()               11200      14.74      14.74      14.74   4.54%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      12.04      12.04      12.04   3.71%
PlasmaParticleContainer::ReorderParticles()              5600      10.41      10.41      10.41   3.21%
ParticleContainer::SortParticlesForDeposition()          5600       10.4       10.4       10.4   3.21%
MultiBuffer::get_data()                                 11200      9.332      9.332      9.332   2.88%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200       9.29       9.29       9.29   2.86%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.062      8.062      8.062   2.49%
PermutationForDeposition()                              16800       7.55       7.55       7.55   2.33%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.11%
Fields::LevelUpBoundary()                              123200      1.986      1.986      1.986   0.61%
shiftSlippedParticles()                                 11200      1.788      1.788      1.788   0.55%
Fields::ShiftSlices()                                   22400      1.369      1.369      1.369   0.42%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.39%
Hipace::InitializeSxSyWithBeam()                        22400       1.14       1.14       1.14   0.35%
Other                                                  178374      3.198      3.198      3.198   0.99%
------------------------------------------------------------------------------------------------------

Device Memory Usage:
-----------------------------------------------------------------------------------
Name                                             Nalloc   Nfree    AvgMem    MaxMem
-----------------------------------------------------------------------------------
The_Arena::Initialize()                               1       1   835 KiB    59 GiB
ParticleContainer::SortParticlesForDeposition()   16800   16800  1558 MiB  1622 MiB
PlasmaParticleContainer::InitParticles()            126     126  3120 KiB  1576 MiB
BeamParticleContainer::resize()                  110721  110721   389 MiB  1107 MiB
Fields::AllocData()                                   2       2   337 MiB   337 MiB
BeamParticleContainer::ReorderParticles()         33600   33600    38 MiB   145 MiB
Hipace::ExplicitMGSolveBxBy()                        72      72   117 MiB   117 MiB
FFTPoissonSolverDirichletFast::define()              14      14    79 MiB    79 MiB
PermutationForDeposition()                        67200   67200  2570 KiB    66 MiB
ResizeRandomSeed                                      1       1    40 MiB    40 MiB
DepositCurrent_PlasmaParticleContainer()         179216  179216  4859 KiB    36 MiB
ExplicitDeposition()                             134400  134400  7886 KiB    36 MiB
DepositCurrentSlice_BeamParticleContainer()      134398  134398  2903 KiB    34 MiB
hpmg::MultiGrid::solve1()                         88139   88139    21 KiB   432 KiB
shiftSlippedParticles()                           53444   53444   259   B   108 KiB
Hipace::InitData()                                   13      13   495   B   496   B
main()                                               11      11   431   B   432   B
Fields::Copy()                                        1       1    15   B    16   B
-----------------------------------------------------------------------------------

Managed Memory Usage:
----------------------------------------------------------------
Name                             Nalloc  Nfree  AvgMem    MaxMem
----------------------------------------------------------------
The_Managed_Arena::Initialize()       1      1   2   B  8192 KiB
----------------------------------------------------------------

Pinned Memory Usage:
---------------------------------------------------------------------------
Name                                      Nalloc  Nfree    AvgMem    MaxMem
---------------------------------------------------------------------------
Diagnostic::ResizeFDiagFAB()                   2      2   139 MiB   139 MiB
The_Pinned_Arena::Initialize()                 1      1   146   B  8192 KiB
Hipace::InitData()                            55     55   175 KiB   175 KiB
Hipace::ExplicitMGSolveBxBy()                  2      2  2046   B  2048   B
main()                                        98     98   431   B   464   B
Fields::Copy()                                 1      1    15   B    16   B
PlasmaParticleContainer::InitParticles()       2      2     0   B    16   B
hpmg::MultiGrid::solve1()                  88139  88139     1   B    16   B
shiftSlippedParticles()                    22400  22400     0   B    16   B
---------------------------------------------------------------------------

dev:

Finished Evolve after 361 seconds using 1 rank
Total time per particle push: 1.718 nanoseconds (1.97 plasma, 13.47 beam)
Total time per cell update: 15.4 nanoseconds

TinyProfiler total time across processes [min...avg...max]: 361 ... 361 ... 361

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
------------------------------------------------------------------------------------------------------
ExplicitDeposition()                                    44800      95.92      95.92      95.92  26.57%
DepositCurrent_PlasmaParticleContainer()                44804      72.11      72.11      72.11  19.97%
AdvancePlasmaParticles()                                44800      65.87      65.87      65.87  18.24%
hpmg::MultiGrid::solve1()                               22400      30.63      30.63      30.63   8.48%
DepositCurrentSlice_BeamParticleContainer()             44800      18.85      18.85      18.85   5.22%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   4.72%
BeamParticleContainer::ReorderParticles()               11200      9.584      9.584      9.584   2.65%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.276      9.276      9.276   2.57%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.115      8.115      8.115   2.25%
ParticleContainer::SortParticlesForDeposition()          5600       8.02       8.02       8.02   2.22%
PermutationForDeposition()                              16800      7.551      7.551      7.551   2.09%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.00%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      2.476      2.476      2.476   0.69%
Fields::LevelUpBoundary()                              123200      1.973      1.973      1.973   0.55%
shiftSlippedParticles()                                 11200      1.464      1.464      1.464   0.41%
Fields::ShiftSlices()                                   22400      1.373      1.373      1.373   0.38%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.35%
Hipace::InitializeSxSyWithBeam()                        22400      1.136      1.136      1.136   0.31%
Hipace::SolveOneSlice()                                 11200     0.7106     0.7106     0.7106   0.20%
Hipace::Evolve()                                            1     0.6849     0.6849     0.6849   0.19%
Hipace::ExplicitMGSolveBxBy()                           22400     0.3109     0.3109     0.3109   0.09%
MultiBuffer::get_data()                                 11200    0.03909    0.03909    0.03909   0.01%
PlasmaParticleContainer::ReorderParticles()              5600   0.009305   0.009305   0.009305   0.00%
main()                                                      1   0.002326   0.002326   0.002326   0.00%
Other                                                  178374       3.03       3.03       3.03   0.84%
------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
Name                                                   NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
------------------------------------------------------------------------------------------------------
main()                                                      1        361        361        361 100.00%
Hipace::Evolve()                                            1        361        361        361  99.97%
Hipace::SolveOneSlice()                                 11200      359.9      359.9      359.9  99.69%
ExplicitDeposition()                                    44800      95.92      95.92      95.92  26.57%
DepositCurrent_PlasmaParticleContainer()                44804      72.11      72.11      72.11  19.97%
AdvancePlasmaParticles()                                44800      65.87      65.87      65.87  18.24%
Hipace::ExplicitMGSolveBxBy()                           22400       32.1       32.1       32.1   8.89%
hpmg::MultiGrid::solve1()                               22400      30.63      30.63      30.63   8.48%
DepositCurrentSlice_BeamParticleContainer()             44800      18.85      18.85      18.85   5.22%
AdvanceBeamParticlesSlice()                             11200      17.05      17.05      17.05   4.72%
BeamParticleContainer::ReorderParticles()               11200      14.74      14.74      14.74   4.08%
Fields::SolvePoissonPsiExmByEypBxEzBz()                 11200      12.08      12.08      12.08   3.35%
PlasmaParticleContainer::ReorderParticles()              5600      10.42      10.42      10.42   2.89%
ParticleContainer::SortParticlesForDeposition()          5600      10.41      10.41      10.41   2.88%
MultiBuffer::get_data()                                 11200      9.356      9.356      9.356   2.59%
BeamParticleContainer::InitBeamFixedWeightPDFSlice()    11200      9.317      9.317      9.317   2.58%
FFTPoissonSolverDirichletFast::SolvePoissonEquation()   67200      8.115      8.115      8.115   2.25%
PermutationForDeposition()                              16800      7.551      7.551      7.551   2.09%
PlasmaParticleContainer::TagByLevel()                   22402      3.609      3.609      3.609   1.00%
Fields::LevelUpBoundary()                              123200      1.973      1.973      1.973   0.55%
shiftSlippedParticles()                                 11200      1.776      1.776      1.776   0.49%
Fields::ShiftSlices()                                   22400      1.373      1.373      1.373   0.38%
Fields::InitializeSlices()                              22400      1.257      1.257      1.257   0.35%
Hipace::InitializeSxSyWithBeam()                        22400      1.136      1.136      1.136   0.31%
Other                                                  178374      3.136      3.136      3.136   0.87%
------------------------------------------------------------------------------------------------------

Unused ParmParse Variables:
  [TOP]::hipace.do_shared_depos(nvals = 1)  :: [true]

Device Memory Usage:
-----------------------------------------------------------------------------------
Name                                             Nalloc   Nfree    AvgMem    MaxMem
-----------------------------------------------------------------------------------
The_Arena::Initialize()                               1       1   791 KiB    59 GiB
ParticleContainer::SortParticlesForDeposition()   16800   16800  1558 MiB  1622 MiB
PlasmaParticleContainer::InitParticles()            126     126  3194 KiB  1576 MiB
BeamParticleContainer::resize()                  110721  110721   360 MiB  1107 MiB
Fields::AllocData()                                   2       2   337 MiB   337 MiB
BeamParticleContainer::ReorderParticles()         33600   33600    35 MiB   145 MiB
Hipace::ExplicitMGSolveBxBy()                        72      72   117 MiB   117 MiB
FFTPoissonSolverDirichletFast::define()              14      14    79 MiB    79 MiB
PermutationForDeposition()                        67200   67200  2310 KiB    66 MiB
ResizeRandomSeed                                      1       1    40 MiB    40 MiB
hpmg::MultiGrid::solve1()                         88139   88139    19 KiB   432 KiB
shiftSlippedParticles()                           53444   53444   233   B   108 KiB
Hipace::InitData()                                   13      13   495   B   496   B
main()                                               11      11   431   B   432   B
DepositCurrent_PlasmaParticleContainer()          44804   44804     3   B    16   B
Fields::Copy()                                        1       1    15   B    16   B
-----------------------------------------------------------------------------------

Managed Memory Usage:
----------------------------------------------------------------
Name                             Nalloc  Nfree  AvgMem    MaxMem
----------------------------------------------------------------
The_Managed_Arena::Initialize()       1      1   1   B  8192 KiB
----------------------------------------------------------------

Pinned Memory Usage:
---------------------------------------------------------------------------
Name                                      Nalloc  Nfree    AvgMem    MaxMem
---------------------------------------------------------------------------
Diagnostic::ResizeFDiagFAB()                   2      2   139 MiB   139 MiB
The_Pinned_Arena::Initialize()                 1      1   130   B  8192 KiB
Hipace::InitData()                            55     55   175 KiB   175 KiB
Hipace::ExplicitMGSolveBxBy()                  2      2  2046   B  2048   B
main()                                        96     96   431   B   464   B
Fields::Copy()                                 1      1    15   B    16   B
PlasmaParticleContainer::InitParticles()       2      2     0   B    16   B
hpmg::MultiGrid::solve1()                  88139  88139     1   B    16   B
shiftSlippedParticles()                    22400  22400     0   B    16   B
---------------------------------------------------------------------------