Closed ilectra closed 1 week ago
Initial simple profiling done with nsys profile -t cuda --stats=true -o test01 ./Test_hmc_Sp_WilsonFundFermionGauge --grid 8.8.8.8 --mpi 1.1.1.1 --Thermalizations 0 --Trajectories 1
Result is in /home/dp208/dp208/dc-gree5/RAC16/ed_test/grid_test_202410/Grid/build/tests/sp2n/test01.nsys-rep
Waiting to also find a good way of profiling with ncu
and also run a CPU profiled case before closing.
@asifsamiarain to produce vtune profile with CPU compilers for --grid=8.8.8.8
and for a bit larger system (maybe '16.16.16.16` or something)
Profiles available for our perusal:
/home/dp208/dp208/shared/RAC16/pgda001/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_ivcha01/build/tests/sp2n/ivcha01 /home/dp208/dp208/shared/RAC16/pgda001/gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_ivchb01/build/tests/sp2n/ivchb01
Got an initial muProf case with AMDuProfCLI profile
.
It created a report you can find in /home/dp208/dp208/shared/RAC16//home/dp208/dp208/shared/RAC16/AMDuProf-Test_hmc_Sp_WilsonFundFermionGauge-TBP_Oct-30-2024_16-10-52 -- however it provided quite limited information. I think I need to enable more flags to the profiler as the run took like 5s, but it's a start. I will try and read up on muProf in more detail to learn how to use it effectively.
Reference code gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01
having default MDsteps=36 & trajL=1.0 parameter values; two grids 8.8.8.8 & 16.16.16.16 and two threading options 1 & 8.
Intel VTune profiles:
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_ivcha04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-8.8.8.8-1.1.1.1_ivcha04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_ivchb04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-16.16.16.16-1.1.1.1_ivchb04
AMD uProf:
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_aupa21
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-8.8.8.8-1.1.1.1_aupa21
Following still failing, seems due to wall clock limit (30m fail, 1h fail, 2h fail), as redone with 3h will update this note:
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_aupb21
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-16.16.16.16-1.1.1.1_aupb21
@ilectra : link profiles location to wiki and close issue
DONE:
tests/sp2n/Test_hmc_Sp_WilsonFundFermionGauge.cc
done and stored in shared foldernsys
GPUncu
GPU~ will look when we move to GPU