UCL-ARC / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
0 stars 0 forks source link

Profile test case #4

Closed ilectra closed 1 week ago

ilectra commented 1 month ago

DONE:

qiUip commented 1 month ago

Initial simple profiling done with nsys profile -t cuda --stats=true -o test01 ./Test_hmc_Sp_WilsonFundFermionGauge --grid 8.8.8.8 --mpi 1.1.1.1 --Thermalizations 0 --Trajectories 1

Result is in /home/dp208/dp208/dc-gree5/RAC16/ed_test/grid_test_202410/Grid/build/tests/sp2n/test01.nsys-rep

Waiting to also find a good way of profiling with ncu and also run a CPU profiled case before closing.

ilectra commented 1 month ago

@asifsamiarain to produce vtune profile with CPU compilers for --grid=8.8.8.8 and for a bit larger system (maybe '16.16.16.16` or something)

asifsamiarain commented 1 month ago

Profiles available for our perusal:

/home/dp208/dp208/shared/RAC16/pgda001/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_ivcha01/build/tests/sp2n/ivcha01 /home/dp208/dp208/shared/RAC16/pgda001/gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_ivchb01/build/tests/sp2n/ivchb01

qiUip commented 4 weeks ago

Got an initial muProf case with AMDuProfCLI profile.

It created a report you can find in /home/dp208/dp208/shared/RAC16//home/dp208/dp208/shared/RAC16/AMDuProf-Test_hmc_Sp_WilsonFundFermionGauge-TBP_Oct-30-2024_16-10-52 -- however it provided quite limited information. I think I need to enable more flags to the profiler as the run took like 5s, but it's a start. I will try and read up on muProf in more detail to learn how to use it effectively.

asifsamiarain commented 2 weeks ago

Reference code gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01 having default MDsteps=36 & trajL=1.0 parameter values; two grids 8.8.8.8 & 16.16.16.16 and two threading options 1 & 8.

Intel VTune profiles:

/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_ivcha04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-8.8.8.8-1.1.1.1_ivcha04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_ivchb04
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-16.16.16.16-1.1.1.1_ivchb04

AMD uProf:

/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_aupa21
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-8.8.8.8-1.1.1.1_aupa21

Following still failing, seems due to wall clock limit (30m fail, 1h fail, 2h fail), as redone with 3h will update this note:

/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-001_sp2n-16.16.16.16-1.1.1.1_aupb21
/home/dp208/dp208/shared/RAC16/pgda002/gd_tursa-gnu_seq-001_sp2n-8.8.8.8-1.1.1.1_def01/build/tests/sp2n/pgda002_gd_tursa-gnu_seq-008_sp2n-16.16.16.16-1.1.1.1_aupb21
ilectra commented 1 week ago

@ilectra : link profiles location to wiki and close issue