AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
506 stars 337 forks source link

TinyProfiler: shorten output into "Other" section #3885

Closed AlexanderSinn closed 3 months ago

AlexanderSinn commented 3 months ago

Summary

In this PR, an option is added to shorten the output from TinyProfiler at the end of a simulation.

tiny_profiler.print_threshold = 1.

With the current approach, tiny_profiler.print_threshold specifies the maximum inclusive runtime that the "Other" section can take in percent relative to the total runtime. The default value is 1 (=1%), which results in at least 99% of the total inclusive and exclusive time still being profiled outside "Other". In the exclusive section, the same functions are combined into "Other" as in the inclusive section. This has the effect that a given function will either show up in both or neither of the sections. But this also means that functions such as "main()" with a large inclusive but short exclusive runtime will still show up in the exclusive section, even though functions with longer exclusive runtime might have been put into "Other".

Additional background

HiPACE++ TinyProfiler output with tiny_profiler.print_threshold = 1.:

TinyProfiler total time across processes [min...avg...max]: 19.95 ... 20.37 ... 20.7

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
hpmg::MultiGrid::solve1()                            1000      6.532      6.542       6.57  31.73%
AnyDST::Execute()                                    6000      3.853      3.869      3.892  18.80%
AdvanceBeamParticlesSlice()                          1000        2.7      2.716      2.736  13.21%
ExplicitDeposition()                                 1000      2.246      2.261      2.277  11.00%
AdvancePlasmaParticles()                             1000      1.249      1.255      1.271   6.14%
DepositCurrent_PlasmaParticleContainer()             1001     0.9996      1.006      1.014   4.90%
MultiBuffer::get_data()                              1000  0.0008554     0.3973     0.9006   4.35%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000     0.4731     0.4752     0.4797   2.32%
Fields::InitializeSlices()                           1000     0.4426     0.4504     0.4597   2.22%
Fields::ShiftSlices()                                1000     0.2872     0.3431     0.4128   1.99%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000     0.3722     0.3751     0.3783   1.83%
Hipace::InitializeSxSyWithBeam()                     1000     0.2212     0.2224     0.2236   1.08%
FillBoundary_nowait()                                4000     0.1168     0.1202      0.124   0.60%
Fields::AddRhoIons()                                 1000    0.09308    0.09395    0.09525   0.46%
MultiBuffer::put_data()                              1000   0.004975    0.05095    0.06177   0.30%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02971    0.03274    0.05097   0.25%
DepositCurrentSlice_BeamParticleContainer()          2000    0.03985    0.04177    0.04291   0.21%
Hipace::SolveOneSlice()                              1000   0.007764   0.008161   0.008495   0.04%
Hipace::ExplicitMGSolveBxBy()                        1000   0.003842   0.003985   0.004245   0.02%
Hipace::Evolve()                                        1  0.0008552   0.001817    0.00225   0.01%
FabArray::FillBoundary()                             4000   0.001446   0.001505   0.001561   0.01%
main()                                                  1   0.001071   0.001176   0.001244   0.01%
Other                                               11832    0.08133     0.1007     0.1464   0.71%
--------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
--------------------------------------------------------------------------------------------------
main()                                                  1      19.95      20.37       20.7 100.00%
Hipace::Evolve()                                        1      19.91      20.33      20.66  99.80%
Hipace::SolveOneSlice()                              1000      19.88      20.31      20.64  99.69%
Hipace::ExplicitMGSolveBxBy()                        1000      6.536      6.546      6.574  31.75%
hpmg::MultiGrid::solve1()                            1000      6.532      6.542       6.57  31.73%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000      4.766      4.784      4.815  23.25%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000      4.326      4.344      4.372  21.11%
AnyDST::Execute()                                    6000      3.853      3.869      3.892  18.80%
AdvanceBeamParticlesSlice()                          1000        2.7      2.716      2.736  13.21%
ExplicitDeposition()                                 1000      2.246      2.261      2.277  11.00%
AdvancePlasmaParticles()                             1000      1.249      1.255      1.271   6.14%
DepositCurrent_PlasmaParticleContainer()             1001     0.9996      1.006      1.014   4.90%
MultiBuffer::get_data()                              1000    0.03689     0.4028     0.9019   4.36%
Fields::InitializeSlices()                           1000     0.4426     0.4504     0.4597   2.22%
Fields::ShiftSlices()                                1000     0.2872     0.3431     0.4128   1.99%
Hipace::InitializeSxSyWithBeam()                     1000     0.2782      0.281     0.2849   1.38%
FabArray::FillBoundary()                             4000     0.1199     0.1233     0.1271   0.61%
FillBoundary_nowait()                                4000     0.1176      0.121     0.1248   0.60%
Fields::AddRhoIons()                                 1000    0.09308    0.09395    0.09525   0.46%
MultiBuffer::put_data()                              1000   0.004975    0.05116    0.06203   0.30%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02971    0.03274    0.05097   0.25%
DepositCurrentSlice_BeamParticleContainer()          2000    0.03985    0.04177    0.04291   0.21%
Other                                               11832     0.1359     0.1483     0.1925   0.93%
--------------------------------------------------------------------------------------------------

HiPACE++ TinyProfiler output with tiny_profiler.print_threshold = 0 (off):

TinyProfiler total time across processes [min...avg...max]: 20.17 ... 20.66 ... 20.98

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
--------------------------------------------------------------------------------------------------
hpmg::MultiGrid::solve1()                            1000      6.531      6.548       6.58  31.37%
AnyDST::Execute()                                    6000      3.855      3.881      3.942  18.79%
AdvanceBeamParticlesSlice()                          1000      2.702      2.718      2.747  13.09%
ExplicitDeposition()                                 1000      2.244      2.262      2.282  10.88%
AdvancePlasmaParticles()                             1000      1.251      1.269      1.371   6.53%
MultiBuffer::get_data()                              1000  0.0008761     0.6194       1.14   5.43%
DepositCurrent_PlasmaParticleContainer()             1001      1.004       1.01       1.02   4.86%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000     0.4723      0.474      0.477   2.27%
Fields::InitializeSlices()                           1000     0.4447     0.4534     0.4741   2.26%
Fields::ShiftSlices()                                1000      0.288     0.3436     0.4096   1.95%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000     0.3727     0.3747     0.3758   1.79%
Hipace::InitializeSxSyWithBeam()                     1000     0.2198     0.2219     0.2233   1.06%
FillBoundary_nowait()                                4000     0.1168     0.1207     0.1259   0.60%
Fields::AddRhoIons()                                 1000    0.09291    0.09412    0.09528   0.45%
MultiBuffer::put_data()                              1000   0.005048    0.05104    0.06124   0.29%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02938    0.03298    0.04933   0.24%
DepositCurrentSlice_BeamParticleContainer()          2000    0.04035    0.04223    0.04532   0.22%
shiftSlippedParticles()                               678    0.03493    0.03694    0.03845   0.18%
BeamParticleContainer::InitBeamFixedWeightSlice()     125          0   0.004304    0.03443   0.16%
Hipace::InitData()                                      1   0.006485    0.02631     0.0304   0.14%
PlasmaParticleContainer::InitParticles()                1    0.02805    0.02865    0.02917   0.14%
BeamParticleContainer::InitBeamFixedWeight3D()          1   9.41e-07   0.002142    0.01713   0.08%
Hipace::SolveOneSlice()                              1000   0.007763    0.00821    0.00885   0.04%
FabArray::setVal()                                      4   0.007318   0.007621   0.008111   0.04%
AnyDST::CreatePlan()                                    1   0.005389   0.006022   0.006552   0.03%
Fields::AllocData()                                     1   0.005303   0.005756    0.00611   0.03%
sortBeamParticlesByBox()                                0          0  0.0005877   0.004702   0.02%
Hipace::ExplicitMGSolveBxBy()                        1000   0.003881   0.003983   0.004134   0.02%
BeamParticleContainer::resize()                      3014   0.002165   0.002377     0.0025   0.01%
Hipace::Evolve()                                        1  0.0008157   0.001788   0.002294   0.01%
FabArray::FillBoundary()                             4000   0.001378   0.001443   0.001546   0.01%
main()                                                  1    0.00106   0.001194   0.001266   0.01%
FillBoundary_finish()                                4000  0.0007684  0.0008594  0.0009376   0.00%
FabArrayBase::getFB()                                4000   0.000709  0.0007543  0.0008294   0.00%
AdaptiveTimeStep::CalculateFromDensity()                1  6.339e-05   7.22e-05  0.0001312   0.00%
FabArrayBase::FB::FB()                                  1  3.079e-05  3.577e-05  3.732e-05   0.00%
AdaptiveTimeStep::CalculateFromMinUz()                  1  2.495e-06   3.89e-06  1.072e-05   0.00%
ParticleContainer::clearParticles()                     1    3.4e-07  4.009e-07   4.81e-07   0.00%
--------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------
Name                                               NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
--------------------------------------------------------------------------------------------------
main()                                                  1      20.17      20.66      20.98 100.00%
Hipace::Evolve()                                        1      20.12      20.61      20.93  99.77%
Hipace::SolveOneSlice()                              1000      20.07      20.56      20.89  99.57%
Hipace::ExplicitMGSolveBxBy()                        1000      6.535      6.552      6.584  31.38%
hpmg::MultiGrid::solve1()                            1000      6.531      6.548       6.58  31.37%
Fields::SolvePoissonPsiExmByEypBxEzBz()              1000      4.768      4.794       4.86  23.17%
FFTPoissonSolverDirichlet::SolvePoissonEquation()    3000      4.327      4.355      4.419  21.06%
AnyDST::Execute()                                    6000      3.855      3.881      3.942  18.79%
AdvanceBeamParticlesSlice()                          1000      2.702      2.718      2.747  13.09%
ExplicitDeposition()                                 1000      2.244      2.262      2.282  10.88%
AdvancePlasmaParticles()                             1000      1.251      1.269      1.371   6.53%
MultiBuffer::get_data()                              1000     0.0366     0.6249      1.141   5.44%
DepositCurrent_PlasmaParticleContainer()             1001      1.004       1.01       1.02   4.86%
Fields::InitializeSlices()                           1000     0.4447     0.4534     0.4741   2.26%
Fields::ShiftSlices()                                1000      0.288     0.3436     0.4096   1.95%
Hipace::InitializeSxSyWithBeam()                     1000     0.2765      0.281     0.2851   1.36%
FabArray::FillBoundary()                             4000     0.1198     0.1238     0.1291   0.62%
FillBoundary_nowait()                                4000     0.1175     0.1215     0.1266   0.60%
Fields::AddRhoIons()                                 1000    0.09291    0.09412    0.09528   0.45%
MultiBuffer::put_data()                              1000   0.005048    0.05126    0.06151   0.29%
AdaptiveTimeStep::GatherMinUzSlice()                 1000    0.02938    0.03298    0.04933   0.24%
Hipace::InitData()                                      1    0.04745    0.04759     0.0477   0.23%
DepositCurrentSlice_BeamParticleContainer()          2000    0.04035    0.04223    0.04532   0.22%
shiftSlippedParticles()                               678    0.03558    0.03747    0.03907   0.19%
BeamParticleContainer::InitBeamFixedWeightSlice()     125          0   0.004465    0.03572   0.17%
PlasmaParticleContainer::InitParticles()                1    0.02805    0.02866    0.02918   0.14%
Fields::AllocData()                                     1    0.01716    0.01854    0.01907   0.09%
BeamParticleContainer::InitBeamFixedWeight3D()          1   9.41e-07   0.002142    0.01713   0.08%
FabArray::setVal()                                      4   0.007318   0.007621   0.008111   0.04%
AnyDST::CreatePlan()                                    1   0.005389   0.006022   0.006552   0.03%
sortBeamParticlesByBox()                                0          0  0.0005877   0.004702   0.02%
BeamParticleContainer::resize()                      3014   0.002165   0.002377     0.0025   0.01%
FillBoundary_finish()                                4000  0.0007684  0.0008594  0.0009376   0.00%
FabArrayBase::getFB()                                4000  0.0007451    0.00079  0.0008653   0.00%
AdaptiveTimeStep::CalculateFromDensity()                1  6.339e-05   7.22e-05  0.0001312   0.00%
FabArrayBase::FB::FB()                                  1  3.079e-05  3.577e-05  3.732e-05   0.00%
AdaptiveTimeStep::CalculateFromMinUz()                  1  2.495e-06   3.89e-06  1.072e-05   0.00%
ParticleContainer::clearParticles()                     1    3.4e-07  4.009e-07   4.81e-07   0.00%
--------------------------------------------------------------------------------------------------

Checklist

The proposed changes:

WeiqunZhang commented 3 months ago

LGTM. But there are a few clang-tidy warnings that need to be fixed.

ax3l commented 3 months ago

@AlexanderSinn This is great!

I was wondering why tiny_profiler.print_threshold = 1 is just a boolean switch and not an actual threshold value (e.g., 0.95 would be: the sum of all earlier values is 95% of all runtime)? Would this be a good update?

WeiqunZhang commented 3 months ago

It's not bool, it's double.

ax3l commented 3 months ago

Ah, makes sense. Read the PR description details now.