HPSCTerrSys / TSMP2

CMake-based TerrSysMP
https://github.com/HPSCTerrSys/TSMP
1 stars 2 forks source link

Could NOT find OASIS3MCT #4

Closed mvhulten closed 1 year ago

mvhulten commented 1 year ago

I tried to build eCLM-ParFlow on juwels, following the steps as described in the README.

CMake failed:

[vanhulten1@jwlogin07 eTSMP]$ cmake -S . -B ${BUILD_DIR}                  \
>       -DCMAKE_INSTALL_PREFIX="$INSTALL_DIR" \
>       -DeCLM_SRC=${eCLM_SRC}                \                                                                                                                              
>       -DPARFLOW_SRC=${PARFLOW_SRC}     

-- The C compiler identification is Intel 2021.6.0.20220226
-- The CXX compiler identification is Intel 2021.6.0.20220226
-- The Fortran compiler identification is Intel 2021.6.0.20220226
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /p/software/juwels/stages/2022/software/psmpi/5.5.0-1-intel-compilers-2021.4.0/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /p/software/juwels/stages/2022/software/psmpi/5.5.0-1-intel-compilers-2021.4.0/bin/mpicxx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /p/software/juwels/stages/2022/software/psmpi/5.5.0-1-intel-compilers-2021.4.0/bin/mpif90 - skipped
CMake Error at /p/software/juwels/stages/2023/software/CMake/3.23.1-GCCcore-11.3.0/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find OASIS3MCT (missing: PSMILE_LIB MCT_LIB MPEU_LIB SCRIP_LIB)
Call Stack (most recent call first):
  /p/software/juwels/stages/2023/software/CMake/3.23.1-GCCcore-11.3.0/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  cmake/FindOASIS3MCT.cmake:8 (find_package_handle_standard_args)
  CMakeLists.txt:13 (find_package)

-- Configuring incomplete, errors occurred!
See also "/p/home/jusers/vanhulten1/juwels/software/repos/eTSMP/bld/JUWELS_eCLM-ParFlow/CMakeFiles/CMakeOutput.log".
mvhulten commented 1 year ago

It seems that OASIS is missing. In step 4, should there not be also the OASIS "component"? Or a mention of requiring OASIS (albeit maybe trivial for some)?

I pulled in the eCLM and ParFlow components. Only below ./eCLM/src/clm5/oasis3/ I found some OASIS-related source code files.

mvhulten commented 1 year ago

I got access to OASIS.

For now, this is only a documentation issue.

kvrigor commented 1 year ago

Hi! CMake should automatically find OASIS after running source env/jsc.2022_Intel.sh. This script automatically sets the path to the shared OASIS library:

https://github.com/HPSCTerrSys/eTSMP/blob/5a8c08c69d735f54ff831a8ebb60c3dc14edcafd/env/jsc.2022_Intel.sh#L30-L34

One problem here is you may need access to the cslts compute project. Do you have access to cslts or will you be using another compute project for your runs?

mvhulten commented 1 year ago

Thank you for your help!

I have now asked for access to cslts.

I have compiled https://gitlab.com/cerfacs/oasis3-mct, but it is unclear to me how to use it / what to set OASIS_ROOT to. (I tried a path containing a lib directory but not include, but this is all a bit off-topic.)

DCaviedesV commented 1 year ago

@kvrigor is that a path to a built shared library? or to code? if it is code, can we not simply host it in some repo to allow fore easier access?

kvrigor commented 1 year ago

After discussing with @s-poll and @chartick a few days ago, I realized that letting eTSMP reuse a built OASIS3-MCT library causes more headache than I initially thought since not all users have access to the cslts shared folder. Even if the libraries were hosted on a "more public" folder (such as what @DCaviedesV suggests), the overhead of maintaining the libraries becomes another issue. To simplify things for both users and maintainers, I am changing eTSMP's default behavior into building OASIS3-MCT, which is the same with what TSMP is doing.

@mvhulten it would be nice if you can test this new feature which hopefully could solve your issue. You will have to switch to dev-build-oasis branch to try this. Below's a recipe that you can follow:

# 1. Download OASIS3-MCT
git clone https://icg4geo.icg.kfa-juelich.de/ExternalReposPublic/oasis3-mct
OASIS_SRC=`realpath oasis3-mct`

# 2. Use eTSMP `dev-build-oasis` branch
cd /path/to/your/eTSMP
git pull && git checkout dev-build-oasis

# 3. Load environment variables
source env/jsc.2023_Intel.sh

# 4. Download component models that you may need. (See step 4 from 
#    the README: https://github.com/HPSCTerrSys/eTSMP#quickstart )

# 5. Sample CMake incantation for eCLM-ParFlow. Note that CMake now needs the 
#    source directory for OASIS3-MCT so that it could build it.
cmake -S . -B ${BUILD_DIR}                  \
      -DCMAKE_INSTALL_PREFIX="$INSTALL_DIR" \
      -DOASIS_SRC=${OASIS_SRC}              \
      -DeCLM_SRC=${eCLM_SRC}                \
      -DPARFLOW_SRC=${PARFLOW_SRC}

cmake --build ${BUILD_DIR}
cmake --install ${BUILD_DIR}
mvhulten commented 1 year ago

Thanks, I've tried this (including step 3 of quick start as well, of course). This happens:

CMake Error at /p/software/juwels/stages/2023/software/CMake/3.23.1-GCCcore-11.3.0/share/cmake-3.23/Modules/ExternalProject.cmake:2776 (message):
  No download info given for 'eCLM' and its source directory:

   /p/home/jusers/vanhulten1/juwels/software/repos/eTSMP/bld/JUWELS_eCLM-ParFlow/eCLM/src/eCLM

  is not an existing non-empty directory.  Please specify one of:

CMakeOutput.log

I have not yet looked closer at it; will do later today.

mvhulten commented 1 year ago

bld/JUWELS_eCLM-ParFlow/eCLM/src/eCLM/ is indeed empty.

Should this be populated in a previous step?

Apropos, eCLM/src/ is not empty (and contains a subdirectory eclm—note the case).

Above paths are w.r.t. root of eTSMP repo.

mvhulten commented 1 year ago

I did something wrong. Building eCLM-ParFlow works now, but then it fails in the build phase of step 6:

[vanhulten1@jwlogin07 eTSMP]$ cmake --build ${BUILD_DIR}
...
[ 12%] Linking C executable test1
ld: /p/home/jusers/vanhulten1/juwels/software/repos/eTSMP/bin/JUWELS_eCLM-ParFlow/lib/libpsmile.MPI1.a(mod_oasis_auxiliary_routines.o): in function `mod_oasis_auxiliary_routi
nes_mp_oasis_put_inquire_':
mod_oasis_auxiliary_routines.F90:(.text+0x4c4f): undefined reference to `oas_m_attrvect_mp_exportrlisttochar__'
ld: /p/home/jusers/vanhulten1/juwels/software/repos/eTSMP/bin/JUWELS_eCLM-ParFlow/lib/libpsmile.MPI1.a(mod_oasis_coupler.o): in function `mod_oasis_coupler_mp_oasis_coupler_s
etup_':
mod_oasis_coupler.F90:(.text+0xa8ca): undefined reference to `oas_m_globalsegmap_mp_gsize__'
ld: mod_oasis_coupler.F90:(.text+0xa8ec): undefined reference to `oas_m_globalsegmap_mp_lsize__'
ld: mod_oasis_coupler.F90:(.text+0xad17): undefined reference to `oas_m_attrvect_mp_init__'
...
ld: mod_oasis_io.F90:(.text+0x10830): undefined reference to `oas_m_attrvectcomms_mp_gsm_scatter__'                                                                          
ld: mod_oasis_io.F90:(.text+0x1084c): undefined reference to `oas_m_attrvect_mp_clean__'                                                                                     
gmake[5]: *** [pfsimulator/amps/test/src/CMakeFiles/test1.dir/build.make:103: pfsimulator/amps/test/src/test1] Error 1                                                       
gmake[4]: *** [CMakeFiles/Makefile2:1151: pfsimulator/amps/test/src/CMakeFiles/test1.dir/all] Error 2                                                                        
gmake[3]: *** [Makefile:146: all] Error 2
gmake[2]: *** [CMakeFiles/ParFlow.dir/build.make:86: ParFlow/src/ParFlow-stamp/ParFlow-build] Error 2                                                                        
gmake[1]: *** [CMakeFiles/Makefile2:139: CMakeFiles/ParFlow.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2
...
vanhulten1@jwlogin07 eTSMP]$ echo $BUILD_DIR
./bld/JUWELS_eCLM-ParFlow

$OASIS_SRC is pointing to the repo with last commit f11cab7fd8a5342547d8fe252ed22d90460cd58d from yesterday by @kvrigor.

kvrigor commented 1 year ago

bld/JUWELS_eCLM-ParFlow/eCLM/src/eCLM/ is indeed empty.

Your eCLM_SRC should be set to /p/home/jusers/vanhulten1/juwels/software/repos/eTSMP/bld/JUWELS_eCLM-ParFlow/eCLM, which is the top level folder when you've cloned eCLM. The model source codes doesn't have to be under eTSMP; you can save them anywhere as long as you set <MODEL>_SRC to the root folder of your model code. I would suggest to use your $PROJECT folder instead of $HOME for your setup since $HOME space is quite limited. For more info you can check this out: File systems for compute projects

I did something wrong. Building eCLM-ParFlow works now, but then it fails in the build phase of step 6:

I'm getting this error too—I'm looking into it now and hopefully a fix will be available soon.

mvhulten commented 1 year ago

@kvrigor, could I get privileges to push a branch?

kvrigor commented 1 year ago

@mvhulten Kindly git pull the latest changes and try to build again.

@kvrigor, could I get privileges to push a branch?

I believe only @HPSC-TerrSys members are allowed to do such (perhaps @DCaviedesV could also comment on this). But feel free to fork and raise a PR for any bugfixes and/or improvements 😃

mvhulten commented 1 year ago

I was able to build the eCLM-ParFlow combination!

This morning, I tried by accident a build of eCLM-ICON (which failed, but that is not important now). Then I just went on with eCLM-ParFlow, but it still failed because it still tried to include ICON. Is that normal behaviour? Is there a way to run a make clean (through CMake)?

In the end, I built eCLM-ParFlow simply by starting with a clean checkout of eTSMP.

kvrigor commented 1 year ago

This morning, I tried by accident a build of eCLM-ICON (which failed, but that is not important now). Then I just went on with eCLM-ParFlow, but it still failed because it still tried to include ICON. Is that normal behaviour?

cmake only knows how to build the models that were specified to it during the configure step (Step 5). To build another model combination, first clean the previous build files (rm -rf ${BUILD_DIR}) and then configure cmake again with the correct model combination.

The key idea is ${BUILD_DIR} and ${INSTALL_DIR} are associated to a particular model combination. You can build different model combinations simultaneously, as long as each model combination has its own ${BUILD_DIR} and ${INSTALL_DIR}. Perhaps this is beyond what a normal user of eTSMP would do, but FYI this is possible.

Is there a way to run a make clean (through CMake)?

cmake --build ${BUILD_DIR} --clean-first. This only works for the model combination you intend to rebuild; otherwise you have to delete ${BUILD_DIR} and reconfigure cmake with the right model combination (as explained above).