ACCESS-NRI / accessdev-Trac-archive

Archive accessdev Trac contents as issues
Apache License 2.0
0 stars 0 forks source link

Dr Hook #316

Closed penguian closed 7 years ago

penguian commented 7 years ago

resolution_fixed | by rb4844


I haven't been able to get this to work. Can it be setup correctly through the rose edit GUI?


Issue migrated from trac:316 at 2024-01-31 18:28:58 +1100

penguian commented 7 years ago

@martin.dix@anu.edu.au commented


Tested in suite u-am567 (copied from Roger's u-al776). Default drhook module is 1.1_ompi185 so explicitly load drhook/1.1_ompi.1.10.2.

FCM doesn't handle switching from the dummy drhook routines to the library properly, so remove the fcm_make_um directory on raijin and do a complete rebuild.

In fcm_make_um/rose-app.conf set

DR_HOOK=true

In um/rose-app.conf set

DR_HOOK=true
DR_HOOK_CATCH_SIGNALS=0
DR_HOOK_IGNORE_SIGNALS=8
DR_HOOK_OPT=wallprof,self
DR_HOOK_PROFILE=$CYLC_TASK_WORK_DIR/ATM_RUNDIR/drhook.prof.%d
DR_HOOK_PROFILE_LIMIT=-10.0
DR_HOOK_PROFILE_PROC=-1

DRHOOK output files are created in u-am567/work/00010101T0000Z/coupled/ATM_RUNDIR.

drhook.prof.1 has

        Wall-time is 2710.08 sec on proc#1 (480 procs, 1 threads)

    #  % Time         Cumul         Self        Total     # of calls        Self       Total    Routine@<thread-id>
                                                                             (Size; Size/sec; Size/call; MinSize; MaxSize)
        (self)        (sec)        (sec)        (sec)                    ms/call     ms/call

    1    18.17      492.516      492.516      498.288         319990        1.54        1.56    TIMER@1
    2     6.37      665.257      172.741      198.543      652879008        0.00        0.00    UKCA_RADAER_LUT_READ_IN:UKCA_RADAER_GET_LUT_INDEX@1
    3     6.29      835.813      170.556     1030.855           2232       76.41      461.85    ATMOS_PHYSICS1@1
    4     5.72      990.862      155.050      155.083         190719        0.81        0.81    IO:BUFFIN64_R@1
    5     4.24     1105.676      114.813      301.014           5952       19.29       50.57    UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
    6     3.00     1187.087       81.411      353.477             31     2626.16    11402.48    ACUMPS@1
    7     2.82     1263.548       76.462       76.504         210743        0.36        0.36    GATHER_FIELD_MPL@1
    8     2.80     1339.372       75.824      145.276          89280        0.85        1.63    EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
    9     2.39     1404.095       64.723       64.764         606643        0.11        0.11    HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
   10     2.20     1463.695       59.600       59.624         108787        0.55        0.55    SCATTER_FIELD_MPL@1
   11     1.98     1517.366       53.671       53.692         235569        0.23        0.23    IO:BUFFOUT64_R@1
   12     1.90     1568.991       51.625       51.666         583518        0.09        0.09    HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
   13     1.72     1615.723       46.732       46.812         935724        0.05        0.05    HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
   14     1.66     1660.714       44.991       50.051      126315672        0.00        0.00    VECTLIB_MOD:EXP_V@1
   15     1.61     1704.430       43.716       43.728         111404        0.39        0.39    EG_CUBIC_LAGRANGE@1
   16     1.58     1747.370       42.939       55.540        1071360        0.04        0.05    UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
   17     1.50     1788.057       40.688       40.722         609509        0.07        0.07    GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
   18     1.44     1827.016       38.959       41.874       73103705        0.00        0.00    VECTLIB_MOD:POWR_V@1
...
  440     0.00     2707.323        0.030     2760.746              1       30.49  2760746.28    U_MODEL_4A@1
...
  510     0.00     2708.872        0.015      358.901             31        0.50    11577.46    MEANCTL@1
...

Suggests some strange interaction with the timer, so reran with that off. Routines with very large call counts will create a substantial overhead that's likely larger than their true cost. For this reason, DrHOOK calls were removed from VECTLIB in vn10.7, https://code.metoffice.gov.uk/trac/um/ticket/2578. Also removed from in UKCA_RADAER_GET_LUT_INDEX along with other optimisations in https://code.metoffice.gov.uk/trac/um/ticket/2787.

Rerun with timer off, drhok calls removed from vectlib and UKCA_RADAER_GET_LUT_INDEX and Gregorian calendar optimisation.

        Wall-time is 1851.38 sec on proc#1 (480 procs, 1 threads)

    #  % Time         Cumul         Self        Total     # of calls        Self       Total    Routine@<thread-id>
                                                                             (Size; Size/sec; Size/call; MinSize; MaxSize)
        (self)        (sec)        (sec)        (sec)                    ms/call     ms/call

    1     9.21      170.502      170.502      170.544         606643        0.28        0.28    HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
    2     8.83      333.908      163.406      507.026           2232       73.21      227.16    ATMOS_PHYSICS1@1
    3     5.20      430.130       96.221       96.264         583518        0.16        0.16    HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
    4     4.92      521.253       91.124       91.128           5952       15.31       15.31    UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
    5     3.72      590.058       68.804      138.302          89280        0.77        1.55    EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
    6     3.52      655.135       65.077       65.111         609509        0.11        0.11    GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
    7     3.04      711.492       56.357       56.374         102728        0.55        0.55    SCATTER_FIELD_MPL@1
    8     3.04      767.840       56.348       56.352          22353        2.52        2.52    SWAP_BOUNDS_MV_MOD:SWAP_BOUNDS_MV@1
    9     2.59      815.849       48.009       49.368       34165434        0.00        0.00    LSP_MOMENTS_MOD:LSP_MOMENTS@1
   10     2.55      863.040       47.191       47.273         935724        0.05        0.05    HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
   11     2.36      906.755       43.715       43.726         111404        0.39        0.39    EG_CUBIC_LAGRANGE@1
   12     2.31      949.607       42.852       55.158        1071360        0.04        0.05    UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
   13     1.78      982.629       33.022       74.333           2232       14.80       33.30    FOR_PATTERN_MOD:FOR_PATTERN@1
   14     1.67     1013.592       30.963       84.817           2232       13.87       38.00    EG_CORRECT_TRACERS_PRIESTLEY_MOD:EG_CORRECT_TRACERS_PRIESTLEY@1
   15     1.65     1044.162       30.569       70.162         154466        0.20        0.45    TRI_SOR_MOD:TRI_SOR_DP_DP@1
   16     1.46     1071.212       27.050      163.832          11904        2.27       13.76    UKCA_AERO_STEP_MOD:UKCA_AERO_STEP@1
   17     1.37     1096.617       25.406       25.407          15625        1.63        1.63    SWAP_BOUNDS_2D_MV_MOD:SWAP_BOUNDS_2D_MV@1
   18     1.21     1118.992       22.375       30.038          35712        0.63        0.84    GLUE_CONV_6A_MOD:GLUE_CONV_6A@1
   19     1.13     1139.943       20.951       40.747           2232        9.39       18.26    STPH_SKEB2_MOD:STPH_SKEB2@1
   20     0.98     1158.060       18.117       22.951      121943534        0.00        0.00    PPXLOOK_MOD:EXPPXI@1
...
  419     0.00     1848.079        0.036     1850.138              1       36.29  1850137.51    U_MODEL_4A@1
...
  521     0.00     1850.384        0.012       11.611             31        0.40      374.55    MEANCTL@1

Note the reduction in overall time, partly from removing some of the DrHook overhead and partly from the Gregorian calendar optimisation.

Backporting the radaer optimisations from vn10.7 to vn10.6 (~mrd599/mosrs/um/v10.6_radaer_vectorisation reduces the time in UKCA_RADAER_BAND_AVERAGE_MOD from 91 to 62 s.

        Wall-time is 1831.27 sec on proc#1 (480 procs, 1 threads)

    #  % Time         Cumul         Self        Total     # of calls        Self       Total    Routine@<thread-id>
                                                                             (Size; Size/sec; Size/call; MinSize; MaxSize)
        (self)        (sec)        (sec)        (sec)                    ms/call     ms/call

    1     9.00      164.736      164.736      478.868           2232       73.81      214.55    ATMOS_PHYSICS1@1
    2     8.35      317.732      152.996      153.044         606643        0.25        0.25    HALO_EXCHANGE:SWAP_BOUNDS_NS_DDT_DP@1
    3     4.70      403.828       86.096       86.147         583518        0.15        0.15    HALO_EXCHANGE:SWAP_BOUNDS_EW_DDT_DP@1
    4     4.02      477.427       73.598      142.955          89280        0.82        1.60    EG_INTERPOLATION_ETA_MOD:EG_INTERPOLATION_ETA@1
    5     3.51      541.654       64.228       64.270         609509        0.11        0.11    GLOBAL_2D_SUMS_MOD:GLOBAL_2D_SUMS@1
    6     3.40      603.980       62.326       62.330           5952       10.47       10.47    UKCA_RADAER_BAND_AVERAGE_MOD:UKCA_RADAER_BAND_AVERAGE@1
    7     3.16      661.764       57.783       57.787          22353        2.59        2.59    SWAP_BOUNDS_MV_MOD:SWAP_BOUNDS_MV@1
    8     3.06      717.875       56.111       56.130         102728        0.55        0.55    SCATTER_FIELD_MPL@1
    9     2.75      768.179       50.305       51.818       34165434        0.00        0.00    LSP_MOMENTS_MOD:LSP_MOMENTS@1
   10     2.62      816.212       48.033       48.132         935724        0.05        0.05    HALO_EXCHANGE:SWAP_BOUNDS_CORNERLESS_DP@1
   11     2.38      859.839       43.627       43.638         111404        0.39        0.39    EG_CUBIC_LAGRANGE@1
   12     2.38      903.416       43.576       56.591        1071360        0.04        0.05    UKCA_COAGWITHNUCL_MOD:UKCA_COAGWITHNUCL@1
   13     2.01      940.305       36.889       78.004           2232       16.53       34.95    FOR_PATTERN_MOD:FOR_PATTERN@1
   14     1.68      971.107       30.802       71.230         154466        0.20        0.46    TRI_SOR_MOD:TRI_SOR_DP_DP@1
   15     1.67     1001.665       30.558       84.264           2232       13.69       37.75    EG_CORRECT_TRACERS_PRIESTLEY_MOD:EG_CORRECT_TRACERS_PRIESTLEY@1
   16     1.49     1028.907       27.242       32.392      121943534        0.00        0.00    PPXLOOK_MOD:EXPPXI@1
   17     1.48     1055.952       27.046      166.755          11904        2.27       14.01    UKCA_AERO_STEP_MOD:UKCA_AERO_STEP@1
   18     1.34     1080.550       24.598       43.479           2232       11.02       19.48    STPH_SKEB2_MOD:STPH_SKEB2@1
   19     1.31     1104.606       24.056       24.057          15625        1.54        1.54    SWAP_BOUNDS_2D_MV_MOD:SWAP_BOUNDS_2D_MV@1
   20     1.24     1127.387       22.782       31.542          35712        0.64        0.88    GLUE_CONV_6A_MOD:GLUE_CONV_6A@1

A run with the same executable with DrHook off took

PE      0  Elapsed Wallclock Time:      1694.23

suggesting about an 8% overhead from DrHook itself.

penguian commented 7 years ago

@martin.dix@anu.edu.au changed _comment0 which not transferred by tractive

penguian commented 7 years ago

@martin.dix@anu.edu.au changed _comment1 which not transferred by tractive

penguian commented 7 years ago

@martin.dix@anu.edu.au changed _comment2 which not transferred by tractive

penguian commented 7 years ago

@martin.dix@anu.edu.au changed _comment3 which not transferred by tractive

penguian commented 7 years ago

@martin.dix@anu.edu.au changed status from new to accepted

penguian commented 7 years ago

@martin.dix@anu.edu.au set owner to mrd599

penguian commented 7 years ago

rb4844 changed status from accepted to closed

penguian commented 7 years ago

rb4844 set resolution to fixed

penguian commented 7 years ago

rb4844 commented


Accepted as resolved. I haven't actually tested it any further, but its not a priority.

IPM works for coupled suite and more detail can be obtained using Score-P if needed.