E3SM-Project / scream

Fork of E3SM used to develop exascale global atmosphere model written in C++
https://e3sm-project.github.io/scream/
Other
80 stars 57 forks source link

Mismatch between dynamics library and the HOMMEXX setup for SCREAM stand-alone runs #1108

Closed AaronDonahue closed 3 years ago

AaronDonahue commented 3 years ago

The cmake macro CreateDynamicsLib creates a unique homme library, and is defined such that if a library has already been defined that matchs the "HOMME_TARGET", "NP", "PLEV" and "QSIZE_D" that the link points to that library rather than create another new one.

There appears to be a problem though, when setting up a new scream stand-alone test that has the same homme dynamics setup.

For example, the homme_stand_alone test has the following in CMakeLists.txt:

# Get or create the dynamics lib
#                 HOMME_TARGET   NP PLEV QSIZE_D
CreateDynamicsLib("theta-l_kokkos"  4   72   35)

If I create a separate test, call it homme_physics_stand_alone and add the same line to my CMakeLists.txt file the macro detects that the scream_theta-l_kokkos_4_72_35 library already exists and set the link to the previously established link, ala

  if ("${hommeLibName}" IN_LIST DynamicsLibsCreated)
    # This dynamics lib was built already somewhere in the project. Nothing to do
    set (dynLibName scream_${hommeLibName})
  else ()

Unfortunately, during the build of the new test we encounter the following compile-time error:

In file included from components/scream/../homme/src/share/cxx/ExecSpaceDefs.hpp(15),
                 from components/scream/../homme/src/share/cxx/Types.hpp(14),
                 from components/scream/tests/coupled/homme_physics/homme_physics_stand_alone.cpp(14):
components/scream/../homme/src/share/cxx/Dimensions.hpp(46): error: identifier "PLEV" is undefined
    static constexpr const int NUM_PHYSICAL_LEV = PLEV;
AaronDonahue commented 3 years ago

Following up on this issue: I investigated the directories for both the homme_stand_alone and homme_physics_stand_alone tests and noticed that there are a few files missing. I'm guessing that Dimensions.hpp is gathering data (including PLEV) from files it expects to find in the test directory and isn't finding them so it fails.

[donahue5@quartz380:homme_physics]$ ls
CMakeFiles  CTestTestfile.cmake  Makefile  cmake_install.cmake  homme_physics_stand_alone_modules  homme_shoc_cld_p3_rad_init_ne4np4.nc  input.yaml  namelist.nl  vcoord

vs.

[donahue5@quartz380:homme]$ ls
CMakeFiles       cmake_install.cmake  homme_stand_alone       input.yaml              namelist.nl
CTestTestfile.cmake  config.h         homme_stand_alone_modules   libscream_theta-l_kokkos_4_72_35.a  theta-l_kokkos_4_72_35_modules
Makefile         config.h.c       homme_standalone_ne4np4.nc  libtheta-l_kokkos_4_72_35.a     vcoord
AaronDonahue commented 3 years ago

For easy lookup, here is a link to the macro which constructs the library: https://github.com/E3SM-Project/scream/blob/master/components/scream/src/dynamics/homme/CMakeLists.txt#L96

AaronDonahue commented 3 years ago

@ambrad , @mt5555 or @jgfouca . Normally I would bother Luca about this, but he is on vacation. I thought I should reach out to you in case you also have some experience with the CMake options used for homme and could assist with this issue.

ambrad commented 3 years ago

Aaron, would you post a reproducing script here?

AaronDonahue commented 3 years ago

@ambrad I created this branch which should make it possible to reproduce the issue: https://github.com/E3SM-Project/scream/tree/aarondonahue/issue_1108_reproducer

my script to build is pretty basic, just building scream making sure to turn homme on. I'll copy/paste it here, but I suspect you have your own script that will accomplish much the same thing. Note I have hard-coded the location of the scream repo in my script. That will need to be changed. This is also a script that accepts an argument for which machine, hence the $1.cmake part.

cmake \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_CXX_COMPILER=$(which mpicxx) \
  -DCMAKE_CC_COMPILER=$(which mpicc) \
  -DCMAKE_Fortran_COMPILER=$(which mpifort) \
  -D EKAT_DEFAULT_BFB=ON \
  -DSCREAM_DYNAMICS_DYCORE=HOMME \
  -C ../scream/components/scream/cmake/machine-files/$1.cmake \
  ../scream/components/scream
AaronDonahue commented 3 years ago

If you navigate to tests/uncoupled in your build directory and issue the command ctest -R homme you should get the failed test.

homme_stand_alone_again, the test that will fail, is a literal copy of the homme_stand_alone test, just named something different. So presumably it would pass just like homme_stand_alone does, except for the issue I describe here.