Closed AlexanderSinn closed 1 month ago
To enable shared memory deposition in GPU add:
hipace.do_shared_depos = true
The parameter plasmas.sort_bin_size was replaced with, as it now also affects beams.
plasmas.sort_bin_size
hipace.tile_size = 32
MR with shared memroy:
Finished Evolve after 324.2 seconds using 1 rank Total time per particle push: 1.544 nanoseconds (1.769 plasma, 12.1 beam) Total time per cell update: 13.83 nanoseconds TinyProfiler total time across processes [min...avg...max]: 324.3 ... 324.3 ... 324.3 ------------------------------------------------------------------------------------------------------ Name NCalls Excl. Min Excl. Avg Excl. Max Max % ------------------------------------------------------------------------------------------------------ ExplicitDeposition() 44800 69.86 69.86 69.86 21.54% AdvancePlasmaParticles() 44800 65.78 65.78 65.78 20.28% DepositCurrent_PlasmaParticleContainer() 44804 44.4 44.4 44.4 13.69% DepositCurrentSlice_BeamParticleContainer() 44800 36.1 36.1 36.1 11.13% hpmg::MultiGrid::solve1() 22400 30.59 30.59 30.59 9.43% AdvanceBeamParticlesSlice() 11200 17.05 17.05 17.05 5.26% BeamParticleContainer::ReorderParticles() 11200 9.576 9.576 9.576 2.95% BeamParticleContainer::InitBeamFixedWeightPDFSlice() 11200 9.248 9.248 9.248 2.85% FFTPoissonSolverDirichletFast::SolvePoissonEquation() 67200 8.062 8.062 8.062 2.49% ParticleContainer::SortParticlesForDeposition() 5600 8.012 8.012 8.012 2.47% PermutationForDeposition() 16800 7.55 7.55 7.55 2.33% PlasmaParticleContainer::TagByLevel() 22402 3.609 3.609 3.609 1.11% Fields::SolvePoissonPsiExmByEypBxEzBz() 11200 2.479 2.479 2.479 0.76% Fields::LevelUpBoundary() 123200 1.986 1.986 1.986 0.61% shiftSlippedParticles() 11200 1.47 1.47 1.47 0.45% Fields::ShiftSlices() 22400 1.369 1.369 1.369 0.42% Fields::InitializeSlices() 22400 1.257 1.257 1.257 0.39% Hipace::InitializeSxSyWithBeam() 22400 1.14 1.14 1.14 0.35% Hipace::SolveOneSlice() 11200 0.7064 0.7064 0.7064 0.22% Hipace::Evolve() 1 0.6222 0.6222 0.6222 0.19% Hipace::ExplicitMGSolveBxBy() 22400 0.3252 0.3252 0.3252 0.10% MultiBuffer::get_data() 11200 0.04189 0.04189 0.04189 0.01% PlasmaParticleContainer::ReorderParticles() 5600 0.00954 0.00954 0.00954 0.00% main() 1 0.001976 0.001976 0.001976 0.00% Other 178374 3.083 3.083 3.083 0.95% ------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ Name NCalls Incl. Min Incl. Avg Incl. Max Max % ------------------------------------------------------------------------------------------------------ main() 1 324.3 324.3 324.3 100.00% Hipace::Evolve() 1 324.2 324.2 324.2 99.97% Hipace::SolveOneSlice() 11200 323.2 323.2 323.2 99.67% ExplicitDeposition() 44800 69.86 69.86 69.86 21.54% AdvancePlasmaParticles() 44800 65.78 65.78 65.78 20.28% DepositCurrent_PlasmaParticleContainer() 44804 44.4 44.4 44.4 13.69% DepositCurrentSlice_BeamParticleContainer() 44800 36.1 36.1 36.1 11.13% Hipace::ExplicitMGSolveBxBy() 22400 32.08 32.08 32.08 9.89% hpmg::MultiGrid::solve1() 22400 30.59 30.59 30.59 9.43% AdvanceBeamParticlesSlice() 11200 17.05 17.05 17.05 5.26% BeamParticleContainer::ReorderParticles() 11200 14.74 14.74 14.74 4.54% Fields::SolvePoissonPsiExmByEypBxEzBz() 11200 12.04 12.04 12.04 3.71% PlasmaParticleContainer::ReorderParticles() 5600 10.41 10.41 10.41 3.21% ParticleContainer::SortParticlesForDeposition() 5600 10.4 10.4 10.4 3.21% MultiBuffer::get_data() 11200 9.332 9.332 9.332 2.88% BeamParticleContainer::InitBeamFixedWeightPDFSlice() 11200 9.29 9.29 9.29 2.86% FFTPoissonSolverDirichletFast::SolvePoissonEquation() 67200 8.062 8.062 8.062 2.49% PermutationForDeposition() 16800 7.55 7.55 7.55 2.33% PlasmaParticleContainer::TagByLevel() 22402 3.609 3.609 3.609 1.11% Fields::LevelUpBoundary() 123200 1.986 1.986 1.986 0.61% shiftSlippedParticles() 11200 1.788 1.788 1.788 0.55% Fields::ShiftSlices() 22400 1.369 1.369 1.369 0.42% Fields::InitializeSlices() 22400 1.257 1.257 1.257 0.39% Hipace::InitializeSxSyWithBeam() 22400 1.14 1.14 1.14 0.35% Other 178374 3.198 3.198 3.198 0.99% ------------------------------------------------------------------------------------------------------ Device Memory Usage: ----------------------------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem ----------------------------------------------------------------------------------- The_Arena::Initialize() 1 1 835 KiB 59 GiB ParticleContainer::SortParticlesForDeposition() 16800 16800 1558 MiB 1622 MiB PlasmaParticleContainer::InitParticles() 126 126 3120 KiB 1576 MiB BeamParticleContainer::resize() 110721 110721 389 MiB 1107 MiB Fields::AllocData() 2 2 337 MiB 337 MiB BeamParticleContainer::ReorderParticles() 33600 33600 38 MiB 145 MiB Hipace::ExplicitMGSolveBxBy() 72 72 117 MiB 117 MiB FFTPoissonSolverDirichletFast::define() 14 14 79 MiB 79 MiB PermutationForDeposition() 67200 67200 2570 KiB 66 MiB ResizeRandomSeed 1 1 40 MiB 40 MiB DepositCurrent_PlasmaParticleContainer() 179216 179216 4859 KiB 36 MiB ExplicitDeposition() 134400 134400 7886 KiB 36 MiB DepositCurrentSlice_BeamParticleContainer() 134398 134398 2903 KiB 34 MiB hpmg::MultiGrid::solve1() 88139 88139 21 KiB 432 KiB shiftSlippedParticles() 53444 53444 259 B 108 KiB Hipace::InitData() 13 13 495 B 496 B main() 11 11 431 B 432 B Fields::Copy() 1 1 15 B 16 B ----------------------------------------------------------------------------------- Managed Memory Usage: ---------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem ---------------------------------------------------------------- The_Managed_Arena::Initialize() 1 1 2 B 8192 KiB ---------------------------------------------------------------- Pinned Memory Usage: --------------------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem --------------------------------------------------------------------------- Diagnostic::ResizeFDiagFAB() 2 2 139 MiB 139 MiB The_Pinned_Arena::Initialize() 1 1 146 B 8192 KiB Hipace::InitData() 55 55 175 KiB 175 KiB Hipace::ExplicitMGSolveBxBy() 2 2 2046 B 2048 B main() 98 98 431 B 464 B Fields::Copy() 1 1 15 B 16 B PlasmaParticleContainer::InitParticles() 2 2 0 B 16 B hpmg::MultiGrid::solve1() 88139 88139 1 B 16 B shiftSlippedParticles() 22400 22400 0 B 16 B ---------------------------------------------------------------------------
dev:
Finished Evolve after 361 seconds using 1 rank Total time per particle push: 1.718 nanoseconds (1.97 plasma, 13.47 beam) Total time per cell update: 15.4 nanoseconds TinyProfiler total time across processes [min...avg...max]: 361 ... 361 ... 361 ------------------------------------------------------------------------------------------------------ Name NCalls Excl. Min Excl. Avg Excl. Max Max % ------------------------------------------------------------------------------------------------------ ExplicitDeposition() 44800 95.92 95.92 95.92 26.57% DepositCurrent_PlasmaParticleContainer() 44804 72.11 72.11 72.11 19.97% AdvancePlasmaParticles() 44800 65.87 65.87 65.87 18.24% hpmg::MultiGrid::solve1() 22400 30.63 30.63 30.63 8.48% DepositCurrentSlice_BeamParticleContainer() 44800 18.85 18.85 18.85 5.22% AdvanceBeamParticlesSlice() 11200 17.05 17.05 17.05 4.72% BeamParticleContainer::ReorderParticles() 11200 9.584 9.584 9.584 2.65% BeamParticleContainer::InitBeamFixedWeightPDFSlice() 11200 9.276 9.276 9.276 2.57% FFTPoissonSolverDirichletFast::SolvePoissonEquation() 67200 8.115 8.115 8.115 2.25% ParticleContainer::SortParticlesForDeposition() 5600 8.02 8.02 8.02 2.22% PermutationForDeposition() 16800 7.551 7.551 7.551 2.09% PlasmaParticleContainer::TagByLevel() 22402 3.609 3.609 3.609 1.00% Fields::SolvePoissonPsiExmByEypBxEzBz() 11200 2.476 2.476 2.476 0.69% Fields::LevelUpBoundary() 123200 1.973 1.973 1.973 0.55% shiftSlippedParticles() 11200 1.464 1.464 1.464 0.41% Fields::ShiftSlices() 22400 1.373 1.373 1.373 0.38% Fields::InitializeSlices() 22400 1.257 1.257 1.257 0.35% Hipace::InitializeSxSyWithBeam() 22400 1.136 1.136 1.136 0.31% Hipace::SolveOneSlice() 11200 0.7106 0.7106 0.7106 0.20% Hipace::Evolve() 1 0.6849 0.6849 0.6849 0.19% Hipace::ExplicitMGSolveBxBy() 22400 0.3109 0.3109 0.3109 0.09% MultiBuffer::get_data() 11200 0.03909 0.03909 0.03909 0.01% PlasmaParticleContainer::ReorderParticles() 5600 0.009305 0.009305 0.009305 0.00% main() 1 0.002326 0.002326 0.002326 0.00% Other 178374 3.03 3.03 3.03 0.84% ------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ Name NCalls Incl. Min Incl. Avg Incl. Max Max % ------------------------------------------------------------------------------------------------------ main() 1 361 361 361 100.00% Hipace::Evolve() 1 361 361 361 99.97% Hipace::SolveOneSlice() 11200 359.9 359.9 359.9 99.69% ExplicitDeposition() 44800 95.92 95.92 95.92 26.57% DepositCurrent_PlasmaParticleContainer() 44804 72.11 72.11 72.11 19.97% AdvancePlasmaParticles() 44800 65.87 65.87 65.87 18.24% Hipace::ExplicitMGSolveBxBy() 22400 32.1 32.1 32.1 8.89% hpmg::MultiGrid::solve1() 22400 30.63 30.63 30.63 8.48% DepositCurrentSlice_BeamParticleContainer() 44800 18.85 18.85 18.85 5.22% AdvanceBeamParticlesSlice() 11200 17.05 17.05 17.05 4.72% BeamParticleContainer::ReorderParticles() 11200 14.74 14.74 14.74 4.08% Fields::SolvePoissonPsiExmByEypBxEzBz() 11200 12.08 12.08 12.08 3.35% PlasmaParticleContainer::ReorderParticles() 5600 10.42 10.42 10.42 2.89% ParticleContainer::SortParticlesForDeposition() 5600 10.41 10.41 10.41 2.88% MultiBuffer::get_data() 11200 9.356 9.356 9.356 2.59% BeamParticleContainer::InitBeamFixedWeightPDFSlice() 11200 9.317 9.317 9.317 2.58% FFTPoissonSolverDirichletFast::SolvePoissonEquation() 67200 8.115 8.115 8.115 2.25% PermutationForDeposition() 16800 7.551 7.551 7.551 2.09% PlasmaParticleContainer::TagByLevel() 22402 3.609 3.609 3.609 1.00% Fields::LevelUpBoundary() 123200 1.973 1.973 1.973 0.55% shiftSlippedParticles() 11200 1.776 1.776 1.776 0.49% Fields::ShiftSlices() 22400 1.373 1.373 1.373 0.38% Fields::InitializeSlices() 22400 1.257 1.257 1.257 0.35% Hipace::InitializeSxSyWithBeam() 22400 1.136 1.136 1.136 0.31% Other 178374 3.136 3.136 3.136 0.87% ------------------------------------------------------------------------------------------------------ Unused ParmParse Variables: [TOP]::hipace.do_shared_depos(nvals = 1) :: [true] Device Memory Usage: ----------------------------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem ----------------------------------------------------------------------------------- The_Arena::Initialize() 1 1 791 KiB 59 GiB ParticleContainer::SortParticlesForDeposition() 16800 16800 1558 MiB 1622 MiB PlasmaParticleContainer::InitParticles() 126 126 3194 KiB 1576 MiB BeamParticleContainer::resize() 110721 110721 360 MiB 1107 MiB Fields::AllocData() 2 2 337 MiB 337 MiB BeamParticleContainer::ReorderParticles() 33600 33600 35 MiB 145 MiB Hipace::ExplicitMGSolveBxBy() 72 72 117 MiB 117 MiB FFTPoissonSolverDirichletFast::define() 14 14 79 MiB 79 MiB PermutationForDeposition() 67200 67200 2310 KiB 66 MiB ResizeRandomSeed 1 1 40 MiB 40 MiB hpmg::MultiGrid::solve1() 88139 88139 19 KiB 432 KiB shiftSlippedParticles() 53444 53444 233 B 108 KiB Hipace::InitData() 13 13 495 B 496 B main() 11 11 431 B 432 B DepositCurrent_PlasmaParticleContainer() 44804 44804 3 B 16 B Fields::Copy() 1 1 15 B 16 B ----------------------------------------------------------------------------------- Managed Memory Usage: ---------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem ---------------------------------------------------------------- The_Managed_Arena::Initialize() 1 1 1 B 8192 KiB ---------------------------------------------------------------- Pinned Memory Usage: --------------------------------------------------------------------------- Name Nalloc Nfree AvgMem MaxMem --------------------------------------------------------------------------- Diagnostic::ResizeFDiagFAB() 2 2 139 MiB 139 MiB The_Pinned_Arena::Initialize() 1 1 130 B 8192 KiB Hipace::InitData() 55 55 175 KiB 175 KiB Hipace::ExplicitMGSolveBxBy() 2 2 2046 B 2048 B main() 96 96 431 B 464 B Fields::Copy() 1 1 15 B 16 B PlasmaParticleContainer::InitParticles() 2 2 0 B 16 B hpmg::MultiGrid::solve1() 88139 88139 1 B 16 B shiftSlippedParticles() 22400 22400 0 B 16 B ---------------------------------------------------------------------------
const
To enable shared memory deposition in GPU add:
The parameter
plasmas.sort_bin_size
was replaced with, as it now also affects beams.MR with shared memroy:
dev:
const
isconst
)