E3SM-Project / HOMMEXX

Clone of ACME for CMDV-SE project to convert HOMME to C++
11 stars 0 forks source link

Remap update #294

Closed mfdeakin-sandia closed 6 years ago

mfdeakin-sandia commented 6 years ago

This implements the new vertical remap boundary conditions from E3SM's version of Homme. This goes a bit ahead of Homme by completely removing the old compute_ppm_grids subroutine from the Fortran, which is supposed to be a non-BFB change. No performance comparisons yet, but I'm not expecting any change. Fixes #279

mfdeakin-sandia commented 6 years ago

Skylake performance, with 1 node, 48 processes, 96 elements, 40 tracers

master.1:prim_main_loop                                1.979
master.1:tl-ae U3-5stage_timestep                      0.151
master.1:tl-ae advance_hypervis_dp                     0.092
master.1:tl-at prim_advec_tracers_remap_RK2            1.683
master.1:tl-sc vertical_remap                          0.091

master.2:prim_main_loop                                1.996
master.2:tl-ae U3-5stage_timestep                      0.164
master.2:tl-ae advance_hypervis_dp                     0.091
master.2:tl-at prim_advec_tracers_remap_RK2            1.701
master.2:tl-sc vertical_remap                          0.093

master.3:prim_main_loop                                2.002
master.3:tl-ae U3-5stage_timestep                      0.157
master.3:tl-ae advance_hypervis_dp                     0.093
master.3:tl-at prim_advec_tracers_remap_RK2            1.708
master.3:tl-sc vertical_remap                          0.091

master.4:prim_main_loop                                1.992
master.4:tl-ae U3-5stage_timestep                      0.140
master.4:tl-ae advance_hypervis_dp                     0.096
master.4:tl-at prim_advec_tracers_remap_RK2            1.696
master.4:tl-sc vertical_remap                          0.092

master.5:prim_main_loop                                1.998
master.5:tl-ae U3-5stage_timestep                      0.152
master.5:tl-ae advance_hypervis_dp                     0.093
master.5:tl-at prim_advec_tracers_remap_RK2            1.702
master.5:tl-sc vertical_remap                          0.092

remap_update.1:prim_main_loop                          2.001
remap_update.1:tl-ae U3-5stage_timestep                0.168
remap_update.1:tl-ae advance_hypervis_dp               0.093
remap_update.1:tl-at prim_advec_tracers_remap_RK2      1.706
remap_update.1:tl-sc vertical_remap                    0.092

remap_update.2:prim_main_loop                          1.991
remap_update.2:tl-ae U3-5stage_timestep                0.146
remap_update.2:tl-ae advance_hypervis_dp               0.093
remap_update.2:tl-at prim_advec_tracers_remap_RK2      1.693
remap_update.2:tl-sc vertical_remap                    0.092

remap_update.3:prim_main_loop                          2.001
remap_update.3:tl-ae U3-5stage_timestep                0.158
remap_update.3:tl-ae advance_hypervis_dp               0.092
remap_update.3:tl-at prim_advec_tracers_remap_RK2      1.705
remap_update.3:tl-sc vertical_remap                    0.093

remap_update.4:prim_main_loop                          1.998
remap_update.4:tl-ae U3-5stage_timestep                0.150
remap_update.4:tl-ae advance_hypervis_dp               0.091
remap_update.4:tl-at prim_advec_tracers_remap_RK2      1.705
remap_update.4:tl-sc vertical_remap                    0.092

remap_update.5:prim_main_loop                          1.994
remap_update.5:tl-ae U3-5stage_timestep                0.146
remap_update.5:tl-ae advance_hypervis_dp               0.092
remap_update.5:tl-at prim_advec_tracers_remap_RK2      1.698
remap_update.5:tl-sc vertical_remap                    0.093
mfdeakin-sandia commented 6 years ago

P100 Performance, 1 GPU, 1 process, 96 elements, 40 tracers:

master.1:prim_main_loop                            2.466
master.1:tl-ae U3-5stage_timestep                  0.347
master.1:tl-ae advance_hypervis_dp                 0.325
master.1:tl-at prim_advec_tracers_remap_RK2        1.598
master.1:tl-sc vertical_remap                      0.141

master.2:prim_main_loop                            2.474
master.2:tl-ae U3-5stage_timestep                  0.345
master.2:tl-ae advance_hypervis_dp                 0.324
master.2:tl-at prim_advec_tracers_remap_RK2        1.595
master.2:tl-sc vertical_remap                      0.141

master.3:prim_main_loop                            2.458
master.3:tl-ae U3-5stage_timestep                  0.345
master.3:tl-ae advance_hypervis_dp                 0.323
master.3:tl-at prim_advec_tracers_remap_RK2        1.594
master.3:tl-sc vertical_remap                      0.141

master.4:prim_main_loop                            2.460
master.4:tl-ae U3-5stage_timestep                  0.345
master.4:tl-ae advance_hypervis_dp                 0.324
master.4:tl-at prim_advec_tracers_remap_RK2        1.595
master.4:tl-sc vertical_remap                      0.141

master.5:prim_main_loop                            2.478
master.5:tl-ae U3-5stage_timestep                  0.346
master.5:tl-ae advance_hypervis_dp                 0.326
master.5:tl-at prim_advec_tracers_remap_RK2        1.597
master.5:tl-sc vertical_remap                      0.141

remap_update.1:prim_main_loop                      2.478
remap_update.1:tl-ae U3-5stage_timestep            0.345
remap_update.1:tl-ae advance_hypervis_dp           0.324
remap_update.1:tl-at prim_advec_tracers_remap_RK2  1.596
remap_update.1:tl-sc vertical_remap                0.141

remap_update.2:prim_main_loop                      2.460
remap_update.2:tl-ae U3-5stage_timestep            0.344
remap_update.2:tl-ae advance_hypervis_dp           0.324
remap_update.2:tl-at prim_advec_tracers_remap_RK2  1.594
remap_update.2:tl-sc vertical_remap                0.141

remap_update.3:prim_main_loop                      2.481
remap_update.3:tl-ae U3-5stage_timestep            0.346
remap_update.3:tl-ae advance_hypervis_dp           0.326
remap_update.3:tl-at prim_advec_tracers_remap_RK2  1.597
remap_update.3:tl-sc vertical_remap                0.141

remap_update.4:prim_main_loop                      2.463
remap_update.4:tl-ae U3-5stage_timestep            0.345
remap_update.4:tl-ae advance_hypervis_dp           0.325
remap_update.4:tl-at prim_advec_tracers_remap_RK2  1.596
remap_update.4:tl-sc vertical_remap                0.141

remap_update.5:prim_main_loop                      2.478
remap_update.5:tl-ae U3-5stage_timestep            0.345
remap_update.5:tl-ae advance_hypervis_dp           0.325
remap_update.5:tl-at prim_advec_tracers_remap_RK2  1.596
remap_update.5:tl-sc vertical_remap                0.141

So performance looks unchanged

mfdeakin-sandia commented 6 years ago

This passes all tests on Skylake, so I think this is ready to merge

worleyph commented 6 years ago

Please excuse another possibly irrelevant, out-of-context, comment, but in E3SM/CESM proper explicit typing of constants was at one time a requirement, and switching precision was implemented by redefining model-specific types. An example from shr_reprosum_mod.F90 follows. (Can't find an example switching from single to double precision floating point at the moment.)

if ( defined noI8 )

   ! Workaround for when shr_kind_i8 is not supported.    use shr_kind_mod,  only: r8 => shr_kind_r8, i8 => shr_kind_i4

else

   use shr_kind_mod,  only: r8 => shr_kind_r8, i8 => shr_kind_i8

endif

...       use_ddpdd_sum = use_ddpdd_sum .or. (radix(0._r8) /= radix(0_i8)) ...

Pat

On 4/25/18 2:56 PM, onguba wrote:

@oksanaguba commented on this pull request.


In components/homme/src/share/vertremap_mod_base.F90 https://github.com/E3SM-Project/HOMMEXX/pull/294#discussion_r184171344:

 rslt( 4,j) = dx(j) / ( dx(j) + dx(j+1) )
  • rslt( 5,j) = 1. / sum( dx(j-1:j+2) )
  • rslt( 6,j) = ( 2. dx(j+1) dx(j) ) / ( dx(j) + dx(j+1 ) )
  • rslt( 7,j) = ( dx(j-1) + dx(j ) ) / ( 2. * dx(j ) + dx(j+1) )
  • rslt( 8,j) = ( dx(j+2) + dx(j+1) ) / ( 2. * dx(j+1) + dx(j ) )
  • rslt( 9,j) = dx(j ) ( dx(j-1) + dx(j ) ) / ( 2.dx(j ) + dx(j+1) )
  • rslt(10,j) = dx(j+1) ( dx(j+1) + dx(j+2) ) / ( dx(j ) + 2.dx(j+1) )
  • rslt( 5,j) = 1.D0 / sum( dx(j-1:j+2) )

i was told to not use .D format. sometimes models are run with single precision, and 2. will be converted to double/single as needed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/E3SM-Project/HOMMEXX/pull/294#pullrequestreview-115311762, or mute the thread https://github.com/notifications/unsubscribe-auth/AHghFMcFHcU8ktvRlXBgbr_GuLivmIPoks5tsMb7gaJpZM4TifG0.