Closed ambrad closed 6 years ago
Tests pass. Speedups:
HSW:
>> masterfeb27
prim_main_loop 32 32 3.200000e+01 2.215424e+03 69.232 ( 0 0) 69.232 ( 19 0)
tl-ae U3-5stage_timestep 32 32 1.603200e+04 1.610637e+02 5.317 ( 22 0) 4.808 ( 6 0)
tl-ae advance_hypervis_dp 32 32 1.603200e+04 6.990127e+01 2.200 ( 19 0) 2.163 ( 0 0)
tl-at prim_advec_tracers_remap_RK2 32 32 1.603200e+04 1.820010e+03 57.118 ( 0 0) 56.519 ( 22 0)
tl-sc vertical_remap 32 32 5.344000e+03 1.370877e+02 4.329 ( 30 0) 4.242 ( 13 0)
>> issue258
prim_main_loop 32 32 3.200000e+01 1.985149e+03 62.036 ( 21 0) 62.036 ( 16 0)
tl-ae U3-5stage_timestep 32 32 1.603200e+04 1.619076e+02 5.227 ( 2 0) 4.905 ( 0 0)
tl-ae advance_hypervis_dp 32 32 1.603200e+04 7.109792e+01 2.272 ( 5 0) 2.193 ( 0 0)
tl-at prim_advec_tracers_remap_RK2 32 32 1.603200e+04 1.587663e+03 49.798 ( 0 0) 49.409 ( 2 0)
tl-sc vertical_remap 32 32 5.344000e+03 1.370966e+02 4.339 ( 28 0) 4.246 ( 16 0)
>> simpsphops
prim_main_loop 32 32 3.200000e+01 1.946724e+03 60.835 ( 0 0) 60.835 ( 25 0)
tl-ae U3-5stage_timestep 32 32 1.603200e+04 1.606352e+02 5.358 ( 15 0) 4.731 ( 0 0)
tl-ae advance_hypervis_dp 32 32 1.603200e+04 6.757350e+01 2.157 ( 24 0) 2.081 ( 14 0)
tl-at prim_advec_tracers_remap_RK2 32 32 1.603200e+04 1.554790e+03 48.893 ( 0 0) 48.191 ( 14 0)
tl-sc vertical_remap 32 32 5.344000e+03 1.362042e+02 4.328 ( 14 0) 4.216 ( 19 0)
P100:
>> pre-issue258
tl-at prim_advec_tracers_remap_RK2 1 1 3.000000e+02 1.155884e+01 11.559 ( 0 0) 11.559 ( 0 0)
tl-ae advance_hypervis_dp 1 1 3.000000e+02 2.553223e+00 2.553 ( 0 0) 2.553 ( 0 0)
>> issue258
tl-at prim_advec_tracers_remap_RK2 1 1 3.000000e+02 1.028752e+01 10.288 ( 0 0) 10.288 ( 0 0)
tl-ae advance_hypervis_dp 1 1 3.000000e+02 2.544365e+00 2.544 ( 0 0) 2.544 ( 0 0)
>> simpsphops
tl-at prim_advec_tracers_remap_RK2 1 1 3.000000e+02 9.288903e+00 9.289 ( 0 0) 9.289 ( 0 0)
tl-ae advance_hypervis_dp 1 1 3.000000e+02 2.392004e+00 2.392 ( 0 0) 2.392 ( 0 0)
KNL:
>> xx master
prim_main_loop 64 64 6.400000e+01 1.964455e+03 30.695 ( 52 0) 30.694 ( 10 0)
tl-ae U3-5stage_timestep 64 64 3.206400e+04 2.062910e+02 3.429 ( 6 0) 2.615 ( 0 0)
tl-ae advance_hypervis_dp 64 64 3.206400e+04 1.631242e+02 2.958 ( 54 0) 2.520 ( 29 0)
tl-at prim_advec_tracers_remap_RK2 64 64 3.206400e+04 1.314289e+03 20.976 ( 0 0) 20.379 ( 22 0)
tl-sc vertical_remap 64 64 1.068800e+04 2.536668e+02 4.126 ( 0 0) 3.904 ( 5 0)
>> simpsphops
prim_main_loop 64 64 6.400000e+01 1.870767e+03 29.231 ( 2 0) 29.230 ( 33 0)
tl-ae U3-5stage_timestep 64 64 3.206400e+04 1.947615e+02 3.204 ( 22 0) 2.485 ( 0 0)
tl-ae advance_hypervis_dp 64 64 3.206400e+04 1.547584e+02 2.825 ( 54 0) 2.386 ( 29 0)
tl-at prim_advec_tracers_remap_RK2 64 64 3.206400e+04 1.242608e+03 19.791 ( 0 0) 19.255 ( 22 0)
tl-sc vertical_remap 64 64 1.068800e+04 2.513592e+02 4.120 ( 0 0) 3.878 ( 53 0)
Focus is on things that will speed up both ESF and HVF.