Closed mfdeakin-sandia closed 6 years ago
@ambrad suggested changing max_num_warps
in ExecSpaceDefs.cpp
to 8
, doing so resolved the GPU issue
Performance on KNL is unchanged, as expected: master:
prim_main_loop 64 64 6.400000e+01 1.727577e+03 26.994 ( 0 0) 26.993 ( 7 0)
tl-ae U3-5stage_timestep 64 64 1.920000e+04 1.683358e+02 3.318 ( 22 0) 2.048 ( 61 0)
tl-ae advance_hypervis_dp 64 64 1.920000e+04 1.278141e+02 2.012 ( 20 0) 1.982 ( 37 0)
tl-at prim_advec_tracers_remap_RK2 64 64 1.920000e+04 1.071100e+03 17.330 ( 61 0) 16.009 ( 22 0)
tl-sc vertical_remap 64 64 6.400000e+03 2.202355e+02 3.485 ( 31 0) 3.420 ( 35 0)
cam_fixes:
prim_main_loop 64 64 6.400000e+01 1.727733e+03 26.997 ( 38 0) 26.995 ( 62 0)
tl-ae U3-5stage_timestep 64 64 1.920000e+04 1.679026e+02 3.330 ( 22 0) 2.068 ( 61 0)
tl-ae advance_hypervis_dp 64 64 1.920000e+04 1.282612e+02 2.026 ( 8 0) 1.991 ( 37 0)
tl-at prim_advec_tracers_remap_RK2 64 64 1.920000e+04 1.070237e+03 17.305 ( 61 0) 15.987 ( 22 0)
tl-sc vertical_remap 64 64 6.400000e+03 2.211031e+02 3.491 ( 43 0) 3.429 ( 58 0)
Would you comment on the reason that the time level can be removed from derived%F*()?
The time level for the forcing variables was removed in homme master, and the time level used always seems to be the same. EDIT: To be clear; CAM fails to compile if the forcing variables have a time level index
This implements several changes needed for integration with CAM. Most changes are done for consistency with HOMME master in E3SM. In particular, the non-BFB changes are caused by taking
parallel_mod.F90
andmetis_mod.F90
wholesale from HOMME master. I verified this by restoring the versions from HOMMEXX master and checked for BFB with HOMMEXX master.I'm still investigating why team sizes changed on GPU machines (they're too large now), and will update when I've fixed that issue.