E3SM-Project / HOMMEXX

Clone of ACME for CMDV-SE project to convert HOMME to C++
11 stars 0 forks source link

[WIP] Optimize SphOps some more. #262

Closed ambrad closed 6 years ago

ambrad commented 6 years ago

Focus is on things that will speed up both ESF and HVF.

ambrad commented 6 years ago

Tests pass. Speedups:

HSW:

>> masterfeb27
prim_main_loop                            32       32 3.200000e+01   2.215424e+03    69.232 (     0      0)    69.232 (    19      0)
tl-ae U3-5stage_timestep                  32       32 1.603200e+04   1.610637e+02     5.317 (    22      0)     4.808 (     6      0)
tl-ae advance_hypervis_dp                 32       32 1.603200e+04   6.990127e+01     2.200 (    19      0)     2.163 (     0      0)
tl-at prim_advec_tracers_remap_RK2        32       32 1.603200e+04   1.820010e+03    57.118 (     0      0)    56.519 (    22      0)
tl-sc vertical_remap                      32       32 5.344000e+03   1.370877e+02     4.329 (    30      0)     4.242 (    13      0)
>> issue258
prim_main_loop                            32       32 3.200000e+01   1.985149e+03    62.036 (    21      0)    62.036 (    16      0)
tl-ae U3-5stage_timestep                  32       32 1.603200e+04   1.619076e+02     5.227 (     2      0)     4.905 (     0      0)
tl-ae advance_hypervis_dp                 32       32 1.603200e+04   7.109792e+01     2.272 (     5      0)     2.193 (     0      0)
tl-at prim_advec_tracers_remap_RK2        32       32 1.603200e+04   1.587663e+03    49.798 (     0      0)    49.409 (     2      0)
tl-sc vertical_remap                      32       32 5.344000e+03   1.370966e+02     4.339 (    28      0)     4.246 (    16      0)
>> simpsphops
prim_main_loop                            32       32 3.200000e+01   1.946724e+03    60.835 (     0      0)    60.835 (    25      0)
tl-ae U3-5stage_timestep                  32       32 1.603200e+04   1.606352e+02     5.358 (    15      0)     4.731 (     0      0)
tl-ae advance_hypervis_dp                 32       32 1.603200e+04   6.757350e+01     2.157 (    24      0)     2.081 (    14      0)
tl-at prim_advec_tracers_remap_RK2        32       32 1.603200e+04   1.554790e+03    48.893 (     0      0)    48.191 (    14      0)
tl-sc vertical_remap                      32       32 5.344000e+03   1.362042e+02     4.328 (    14      0)     4.216 (    19      0)

P100:

>> pre-issue258
tl-at prim_advec_tracers_remap_RK2         1        1 3.000000e+02   1.155884e+01    11.559 (     0      0)    11.559 (     0      0)
tl-ae advance_hypervis_dp                  1        1 3.000000e+02   2.553223e+00     2.553 (     0      0)     2.553 (     0      0)
>> issue258
tl-at prim_advec_tracers_remap_RK2         1        1 3.000000e+02   1.028752e+01    10.288 (     0      0)    10.288 (     0      0)
tl-ae advance_hypervis_dp                  1        1 3.000000e+02   2.544365e+00     2.544 (     0      0)     2.544 (     0      0)
>> simpsphops
tl-at prim_advec_tracers_remap_RK2         1        1 3.000000e+02   9.288903e+00     9.289 (     0      0)     9.289 (     0      0)
tl-ae advance_hypervis_dp                  1        1 3.000000e+02   2.392004e+00     2.392 (     0      0)     2.392 (     0      0)

KNL:

>> xx master
prim_main_loop                            64       64 6.400000e+01   1.964455e+03    30.695 (    52      0)    30.694 (    10      0)
tl-ae U3-5stage_timestep                  64       64 3.206400e+04   2.062910e+02     3.429 (     6      0)     2.615 (     0      0)
tl-ae advance_hypervis_dp                 64       64 3.206400e+04   1.631242e+02     2.958 (    54      0)     2.520 (    29      0)
tl-at prim_advec_tracers_remap_RK2        64       64 3.206400e+04   1.314289e+03    20.976 (     0      0)    20.379 (    22      0)
tl-sc vertical_remap                      64       64 1.068800e+04   2.536668e+02     4.126 (     0      0)     3.904 (     5      0)
>> simpsphops
prim_main_loop                            64       64 6.400000e+01   1.870767e+03    29.231 (     2      0)    29.230 (    33      0)
tl-ae U3-5stage_timestep                  64       64 3.206400e+04   1.947615e+02     3.204 (    22      0)     2.485 (     0      0)
tl-ae advance_hypervis_dp                 64       64 3.206400e+04   1.547584e+02     2.825 (    54      0)     2.386 (    29      0)
tl-at prim_advec_tracers_remap_RK2        64       64 3.206400e+04   1.242608e+03    19.791 (     0      0)    19.255 (    22      0)
tl-sc vertical_remap                      64       64 1.068800e+04   2.513592e+02     4.120 (     0      0)     3.878 (    53      0)