Closed penguian closed 7 years ago
@martin.dix@anu.edu.au commented
Tested in suite u-am567 (copied from Roger's u-al776).
Default drhook module is 1.1_ompi185
so explicitly load drhook/1.1_ompi.1.10.2
.
FCM doesn't handle switching from the dummy drhook routines to the library properly, so remove the fcm_make_um directory on raijin and do a complete rebuild.
In fcm_make_um/rose-app.conf
set
DR_HOOK=true
In um/rose-app.conf
set
DR_HOOK=true
DR_HOOK_CATCH_SIGNALS=0
DR_HOOK_IGNORE_SIGNALS=8
DR_HOOK_OPT=wallprof,self
DR_HOOK_PROFILE=$CYLC_TASK_WORK_DIR/ATM_RUNDIR/drhook.prof.%d
DR_HOOK_PROFILE_LIMIT=-10.0
DR_HOOK_PROFILE_PROC=-1
DRHOOK output files are created in u-am567/work/00010101T0000Z/coupled/ATM_RUNDIR
.
drhook.prof.1 has
Wall-time is 2710.08 sec on proc#1 (480 procs, 1 threads)
# % Time Cumul Self Total # of calls Self Total Routine@<thread-id>
(Size; Size/sec; Size/call; MinSize; MaxSize)
(self) (sec) (sec) (sec) ms/call ms/call
1 18.17 492.516 492.516 498.288 319990 1.54 1.56 TIMER@1
2 6.37 665.257 172.741 198.543 652879008 0.00 0.00 UKCA_RADAER_LUT_READ_IN:UKCA_RADAER_GET_LUT_INDEX@1
3 6.29 835.813 170.556 1030.855 2232 76.41 461.85 ATMOS_PHYSICS1@1
4 5.72 990.862 155.050 155.083 190719 0.81 0.81 IO:BUFFIN64_R@1
5 4.24 1105.676 114.813 301.014 5952 19.29 50.57 UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
6 3.00 1187.087 81.411 353.477 31 2626.16 11402.48 ACUMPS@1
7 2.82 1263.548 76.462 76.504 210743 0.36 0.36 GATHER_FIELD_MPL@1
8 2.80 1339.372 75.824 145.276 89280 0.85 1.63 EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
9 2.39 1404.095 64.723 64.764 606643 0.11 0.11 HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
10 2.20 1463.695 59.600 59.624 108787 0.55 0.55 SCATTER_FIELD_MPL@1
11 1.98 1517.366 53.671 53.692 235569 0.23 0.23 IO:BUFFOUT64_R@1
12 1.90 1568.991 51.625 51.666 583518 0.09 0.09 HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
13 1.72 1615.723 46.732 46.812 935724 0.05 0.05 HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
14 1.66 1660.714 44.991 50.051 126315672 0.00 0.00 VECTLIB_MOD:EXP_V@1
15 1.61 1704.430 43.716 43.728 111404 0.39 0.39 EG_CUBIC_LAGRANGE@1
16 1.58 1747.370 42.939 55.540 1071360 0.04 0.05 UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
17 1.50 1788.057 40.688 40.722 609509 0.07 0.07 GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
18 1.44 1827.016 38.959 41.874 73103705 0.00 0.00 VECTLIB_MOD:POWR_V@1
...
440 0.00 2707.323 0.030 2760.746 1 30.49 2760746.28 U_MODEL_4A@1
...
510 0.00 2708.872 0.015 358.901 31 0.50 11577.46 MEANCTL@1
...
Suggests some strange interaction with the timer, so reran with that off. Routines with very large call counts will create a substantial overhead that's likely larger than their true cost. For this reason, DrHOOK calls were removed from VECTLIB in vn10.7, https://code.metoffice.gov.uk/trac/um/ticket/2578. Also removed from in UKCA_RADAER_GET_LUT_INDEX along with other optimisations in https://code.metoffice.gov.uk/trac/um/ticket/2787.
Rerun with timer off, drhok calls removed from vectlib and UKCA_RADAER_GET_LUT_INDEX and Gregorian calendar optimisation.
Wall-time is 1851.38 sec on proc#1 (480 procs, 1 threads)
# % Time Cumul Self Total # of calls Self Total Routine@<thread-id>
(Size; Size/sec; Size/call; MinSize; MaxSize)
(self) (sec) (sec) (sec) ms/call ms/call
1 9.21 170.502 170.502 170.544 606643 0.28 0.28 HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
2 8.83 333.908 163.406 507.026 2232 73.21 227.16 ATMOS_PHYSICS1@1
3 5.20 430.130 96.221 96.264 583518 0.16 0.16 HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
4 4.92 521.253 91.124 91.128 5952 15.31 15.31 UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
5 3.72 590.058 68.804 138.302 89280 0.77 1.55 EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
6 3.52 655.135 65.077 65.111 609509 0.11 0.11 GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
7 3.04 711.492 56.357 56.374 102728 0.55 0.55 SCATTER_FIELD_MPL@1
8 3.04 767.840 56.348 56.352 22353 2.52 2.52 SWAP_BOUNDS_MV_MOD:SWAP_BOUNDS_MV@1
9 2.59 815.849 48.009 49.368 34165434 0.00 0.00 LSP_MOMENTS_MOD:LSP_MOMENTS@1
10 2.55 863.040 47.191 47.273 935724 0.05 0.05 HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
11 2.36 906.755 43.715 43.726 111404 0.39 0.39 EG_CUBIC_LAGRANGE@1
12 2.31 949.607 42.852 55.158 1071360 0.04 0.05 UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
13 1.78 982.629 33.022 74.333 2232 14.80 33.30 FOR_PATTERN_MOD:FOR_PATTERN@1
14 1.67 1013.592 30.963 84.817 2232 13.87 38.00 EG_CORRECT_TRACERS_PRIESTLEY_MOD:EG_CORRECT_TRACERS_PRIESTLEY@1
15 1.65 1044.162 30.569 70.162 154466 0.20 0.45 TRI_SOR_MOD:TRI_SOR_DP_DP@1
16 1.46 1071.212 27.050 163.832 11904 2.27 13.76 UKCA_AERO_STEP_MOD:UKCA_AERO_STEP@1
17 1.37 1096.617 25.406 25.407 15625 1.63 1.63 SWAP_BOUNDS_2D_MV_MOD:SWAP_BOUNDS_2D_MV@1
18 1.21 1118.992 22.375 30.038 35712 0.63 0.84 GLUE_CONV_6A_MOD:GLUE_CONV_6A@1
19 1.13 1139.943 20.951 40.747 2232 9.39 18.26 STPH_SKEB2_MOD:STPH_SKEB2@1
20 0.98 1158.060 18.117 22.951 121943534 0.00 0.00 PPXLOOK_MOD:EXPPXI@1
...
419 0.00 1848.079 0.036 1850.138 1 36.29 1850137.51 U_MODEL_4A@1
...
521 0.00 1850.384 0.012 11.611 31 0.40 374.55 MEANCTL@1
Note the reduction in overall time, partly from removing some of the DrHook overhead and partly from the Gregorian calendar optimisation.
Backporting the radaer optimisations from vn10.7 to vn10.6 (~mrd599/mosrs/um/v10.6_radaer_vectorisation
reduces the time in UKCA_RADAER_BAND_AVERAGE_MOD from 91 to 62 s.
Wall-time is 1831.27 sec on proc#1 (480 procs, 1 threads)
# % Time Cumul Self Total # of calls Self Total Routine@<thread-id>
(Size; Size/sec; Size/call; MinSize; MaxSize)
(self) (sec) (sec) (sec) ms/call ms/call
1 9.00 164.736 164.736 478.868 2232 73.81 214.55 ATMOS_PHYSICS1@1
2 8.35 317.732 152.996 153.044 606643 0.25 0.25 HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
3 4.70 403.828 86.096 86.147 583518 0.15 0.15 HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
4 4.02 477.427 73.598 142.955 89280 0.82 1.60 EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
5 3.51 541.654 64.228 64.270 609509 0.11 0.11 GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
6 3.40 603.980 62.326 62.330 5952 10.47 10.47 UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
7 3.16 661.764 57.783 57.787 22353 2.59 2.59 SWAP_BOUNDS_MV_MOD:SWAP_BOUNDS_MV@1
8 3.06 717.875 56.111 56.130 102728 0.55 0.55 SCATTER_FIELD_MPL@1
9 2.75 768.179 50.305 51.818 34165434 0.00 0.00 LSP_MOMENTS_MOD:LSP_MOMENTS@1
10 2.62 816.212 48.033 48.132 935724 0.05 0.05 HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
11 2.38 859.839 43.627 43.638 111404 0.39 0.39 EG_CUBIC_LAGRANGE@1
12 2.38 903.416 43.576 56.591 1071360 0.04 0.05 UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
13 2.01 940.305 36.889 78.004 2232 16.53 34.95 FOR_PATTERN_MOD:FOR_PATTERN@1
14 1.68 971.107 30.802 71.230 154466 0.20 0.46 TRI_SOR_MOD:TRI_SOR_DP_DP@1
15 1.67 1001.665 30.558 84.264 2232 13.69 37.75 EG_CORRECT_TRACERS_PRIESTLEY_MOD:EG_CORRECT_TRACERS_PRIESTLEY@1
16 1.49 1028.907 27.242 32.392 121943534 0.00 0.00 PPXLOOK_MOD:EXPPXI@1
17 1.48 1055.952 27.046 166.755 11904 2.27 14.01 UKCA_AERO_STEP_MOD:UKCA_AERO_STEP@1
18 1.34 1080.550 24.598 43.479 2232 11.02 19.48 STPH_SKEB2_MOD:STPH_SKEB2@1
19 1.31 1104.606 24.056 24.057 15625 1.54 1.54 SWAP_BOUNDS_2D_MV_MOD:SWAP_BOUNDS_2D_MV@1
20 1.24 1127.387 22.782 31.542 35712 0.64 0.88 GLUE_CONV_6A_MOD:GLUE_CONV_6A@1
A run with the same executable with DrHook off took
PE 0 Elapsed Wallclock Time: 1694.23
suggesting about an 8% overhead from DrHook itself.
@martin.dix@anu.edu.au changed _comment0 which not transferred by tractive
@martin.dix@anu.edu.au changed _comment1 which not transferred by tractive
@martin.dix@anu.edu.au changed _comment2 which not transferred by tractive
@martin.dix@anu.edu.au changed _comment3 which not transferred by tractive
@martin.dix@anu.edu.au changed status from new
to accepted
@martin.dix@anu.edu.au set owner to mrd599
rb4844 changed status from accepted
to closed
rb4844 set resolution to fixed
rb4844 commented
Accepted as resolved. I haven't actually tested it any further, but its not a priority.
IPM works for coupled suite and more detail can be obtained using Score-P if needed.
resolution_fixed
| by rb4844I haven't been able to get this to work. Can it be setup correctly through the rose edit GUI?
Issue migrated from trac:316 at 2024-01-31 18:28:58 +1100