E3SM-Project / HOMMEXX

Clone of ACME for CMDV-SE project to convert HOMME to C++
11 stars 0 forks source link

Cam fixes #344

Closed mfdeakin-sandia closed 6 years ago

mfdeakin-sandia commented 6 years ago

This implements several changes needed for integration with CAM. Most changes are done for consistency with HOMME master in E3SM. In particular, the non-BFB changes are caused by taking parallel_mod.F90 and metis_mod.F90 wholesale from HOMME master. I verified this by restoring the versions from HOMMEXX master and checked for BFB with HOMMEXX master.

I'm still investigating why team sizes changed on GPU machines (they're too large now), and will update when I've fixed that issue.

mfdeakin-sandia commented 6 years ago

@ambrad suggested changing max_num_warps in ExecSpaceDefs.cpp to 8, doing so resolved the GPU issue

mfdeakin-sandia commented 6 years ago

Performance on KNL is unchanged, as expected: master:

prim_main_loop                            64       64 6.400000e+01   1.727577e+03    26.994 (     0      0)    26.993 (     7      0)
tl-ae U3-5stage_timestep                  64       64 1.920000e+04   1.683358e+02     3.318 (    22      0)     2.048 (    61      0)
tl-ae advance_hypervis_dp                 64       64 1.920000e+04   1.278141e+02     2.012 (    20      0)     1.982 (    37      0)
tl-at prim_advec_tracers_remap_RK2        64       64 1.920000e+04   1.071100e+03    17.330 (    61      0)    16.009 (    22      0)
tl-sc vertical_remap                      64       64 6.400000e+03   2.202355e+02     3.485 (    31      0)     3.420 (    35      0)

cam_fixes:

prim_main_loop                            64       64 6.400000e+01   1.727733e+03    26.997 (    38      0)    26.995 (    62      0)
tl-ae U3-5stage_timestep                  64       64 1.920000e+04   1.679026e+02     3.330 (    22      0)     2.068 (    61      0)
tl-ae advance_hypervis_dp                 64       64 1.920000e+04   1.282612e+02     2.026 (     8      0)     1.991 (    37      0)
tl-at prim_advec_tracers_remap_RK2        64       64 1.920000e+04   1.070237e+03    17.305 (    61      0)    15.987 (    22      0)
tl-sc vertical_remap                      64       64 6.400000e+03   2.211031e+02     3.491 (    43      0)     3.429 (    58      0)
ambrad commented 6 years ago

Would you comment on the reason that the time level can be removed from derived%F*()?

mfdeakin-sandia commented 6 years ago

The time level for the forcing variables was removed in homme master, and the time level used always seems to be the same. EDIT: To be clear; CAM fails to compile if the forcing variables have a time level index