JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
26 stars 44 forks source link

[INSTALL]: install ESMF 8.6.1 and MAPL 2.46.2 in spack-stack 1.6.0 #1168

Open junwang-noaa opened 3 months ago

junwang-noaa commented 3 months ago

Package name

ESMF and MAPL

Package version/tag

ESMF/8.6.1 and MAPL/2.46.2

Build options

Current

Installation timeframe

The two libraries will be installed under current spack-stack 1.6.0.

Other information

No response

WCOSS2

WCOSS2: General questions

No response

WCOSS2: Installation and testing

No response

WCOSS2: Technical & security review list

WCOSS2: Additional comments

No response

climbfuji commented 3 months ago

We need to do https://github.com/JCSDA/spack-stack/issues/1157 first, then this.

climbfuji commented 3 months ago

1157 was merged, so we can go ahead with this. Unlikely it will happen before the 4th of July weekend. Many people will be on leave.

climbfuji commented 2 months ago

Below are the instructions and a list of platforms / assigned spack-stack installers:

Instructions

  1. Go to spack-stack-1.6.0 installation and run the basic steps for building spack-stack environments on this system (see https://spack-stack.readthedocs.io/en/1.6.0/PreConfiguredSites.html)

  2. Make sure git remotes are configured correctly to point to JCSDA for both spack-stack and spack, do a git remote update, git check out jcsda/release/1.6.0 (replace jcsda with origin or however the remote is named); a subsequent git status should show

    
    -bash-4.2$ git status

On branch release/1.6.0

Changes not staged for commit:

(use "git add ..." to update what will be committed)

(use "git checkout -- ..." to discard changes in working directory)

(commit or discard the untracked or modified content in submodules)

#

modified: spack (new commits, modified content)

3. `git submodule update` should check out the correct hash for the `spack` submodule; if not, go to `spack`, do a `git remote update && git checkout jcsda/release/1.6.0`.

3. Back to the spack-stack top-level directory: `source setup.sh`

4. For each unified environment in `envs`, do (please use a name that works for your setup, may include compiler suffix etc):

spack stack create env --name=ue-esmf-8.6.1-mapl-2.46.2 --site=s4 --template=unified-dev \ --upstream=/data/prod/jedi/spack-stack/spack-stack-1.6.0/envs/unified-env/install \ 2>&1 | tee log.create.ue-esmf-8.6.1-mapl-2.46.2.001

6. Update `envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml` and set the correct compiler in the compiler matrix line (match upstream!) and set correct esmf/mapl versions:

sed -i "s/'%aocc', '%apple-clang', '%gcc', '%intel'/'%intel'/g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml sed -i "s/mapl@2.40.3 ^esmf@8.5.0/mapl@=2.46.2 ^esmf@=8.6.1/g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml sed -i "s/- mapl@2.40.3 ^esmf@8.6.0//g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml spack env activate -p envs/ue-esmf-8.6.1-mapl-2.46.2

7. Concretize: `spack concretize 2>&1 | tee log.concretize.ue-esmf-8.6.1-mapl-2.46.2.001`, check output:

$ cat log.concretize.ue-esmf-8.6.1-mapl-2.46.2.001 | grep -vE '[+]|[e]|[\^]' ==> Concretized crtm@v2.4.1-jedi%intel

==> Concretized crtm@2.4.0.1%intel

==> Concretized ewok-env%intel+cylc+ecflow

==> Concretized fms@release-jcsda%intel

==> Concretized fms@2023.04%intel

==> Concretized global-workflow-env%intel

==> Concretized gmao-swell-env%intel

==> Concretized gsi-env%intel

==> Concretized jedi-fv3-env%intel

==> Concretized jedi-mpas-env%intel

==> Concretized jedi-neptune-env%intel

==> Concretized jedi-ufs-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2

==> Concretized jedi-um-env%intel

==> Concretized madis@4.5%intel

==> Concretized mapl@=2.46.2%intel ^esmf@=8.6.1

==> Concretized soca-env%intel

==> Concretized ufs-srw-app-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2

==> Concretized ufs-weather-model-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2

List of platforms / installers

ONLY TICK IF YOU'VE ALSO FIXED THE GRIB-UTILS MODULE FOR WGRIB

ONLY DO THIS FOR THE BASE UNIFIED-ENV - IGNORE THE ADDON ENVS

ulmononian commented 2 months ago

@climbfuji @AlexanderRichert-NOAA @jkbk2004 @junwang-noaa i installed a chained env based on 1.6.0 but with esmf/8.6.1 and mapl/2.46.2 here /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core. it is intel only for now. please give a try and let us know how it works with the ufs-wm.

climbfuji commented 2 months ago

@climbfuji @AlexanderRichert-NOAA @jkbk2004 @junwang-noaa i installed a chained env based on 1.6.0 but with esmf/8.6.1 and mapl/2.46.2 here /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core. it is intel only for now. please give a try and let us know how it works with the ufs-wm.

I am doing the gcc part now. I had to comment out jedi-tools-env in the chained environment, but that doesn't matter. Fortunately, 1.6.0 as the last release that had more than one compiler in one environment - this just causes trouble. Unfortunately, though, we always need to go back and make updates to 1.6.0!

climbfuji commented 2 months ago

@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:

==> Ran patch() for mapl
==> mapl: Executing phase: 'cmake'
==> Error: InstallError: Unsupported MPI stack

/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args:
        360        elif self.spec.satisfies("^cray-mpich"):
        361            args.append(self.define("MPI_STACK", "mpich"))
        362        else:
  >>    363            raise InstallError("Unsupported MPI stack")
        364
        365        return args

See build log for details:

Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?

climbfuji commented 2 months ago

@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:

==> Ran patch() for mapl
==> mapl: Executing phase: 'cmake'
==> Error: InstallError: Unsupported MPI stack

/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args:
        360        elif self.spec.satisfies("^cray-mpich"):
        361            args.append(self.define("MPI_STACK", "mpich"))
        362        else:
  >>    363            raise InstallError("Unsupported MPI stack")
        364
        365        return args

See build log for details:

Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?

https://github.com/JCSDA/spack/pull/449 and https://github.com/JCSDA/spack-stack/pull/1189 fix this for release/1.6.0, https://github.com/spack/spack/pull/45164 for spack develop (it will come back to spack-stack-dev with the next pull).

https://github.com/JCSDA/spack-stack/pull/1189 also fixes the missing grib-utils module file change for wgrib.

climbfuji commented 2 months ago

@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:

==> Ran patch() for mapl
==> mapl: Executing phase: 'cmake'
==> Error: InstallError: Unsupported MPI stack

/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args:
        360        elif self.spec.satisfies("^cray-mpich"):
        361            args.append(self.define("MPI_STACK", "mpich"))
        362        else:
  >>    363            raise InstallError("Unsupported MPI stack")
        364
        365        return args

See build log for details:

Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?

JCSDA/spack#449 and #1189 fix this for release/1.6.0, spack/spack#45164 for spack develop (it will come back to spack-stack-dev with the next pull).

1189 also fixes the missing grib-utils module file change for wgrib.

@ulmononian Hercules is done for gcc, and I also fixed the grib-utils module and regenerated all module files.

ulmononian commented 2 months ago

@climbfuji thanks for taking on the hercules gcc issue. it looks to me like you did the gcc install in /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install; am i correct? if so, i can let the ufs-wm devs know.

climbfuji commented 2 months ago

@climbfuji thanks for taking on the hercules gcc issue. it looks to me like you did the gcc install in /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install; am i correct? if so, i can let the ufs-wm devs know.

Correct.

climbfuji commented 2 months ago

It looks like that the platforms missing are all EMC and EPIC systems - everything else is either done or not needed.

zach1221 commented 2 months ago

In case it has not been reported here yet, I wanted to make aware this issue seen on Hercules when testing with the esmf/8.6.1 spack-stack 1.6.0 installation. @jkbk2004 @BrianCurtis-NOAA @FernandoAndrade-NOAA

CMake Error at /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/intel/2021.9.0/mapl-2.46.2-uiwt3at/lib64/cmake/MAPL/MAPL-targets.cmake:73 (set_target_properties): The link interface of target "MAPL_cfio_r4" contains:

ESMF::ESMF

but the target was not found. Possible reasons include:

* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
mathomp4 commented 2 months ago

That is so odd. I mean, ESMF 8.6.1 and MAPL 2.46 were essentially created to allow for the ESMF::ESMF target.

Hmm. My only next thought is that a FindESMF.cmake file is out of date? The one we have in MAPL and the one we have in ESMA_cmake are identical to the one in ESMF.

Could you some how be picking up another one? It was noted by @danrosen25 in the ESMF PR, that, at the time these CMake files were out-of-date:

And a look at them shows them still quite old.

Perhaps some package is still referring to an old FindESMF.cmake in some way?

BrianCurtis-NOAA commented 2 months ago

That is so odd. I mean, ESMF 8.6.1 and MAPL 2.46 were essentially created to allow for the ESMF::ESMF target.

Hmm. My only next thought is that a FindESMF.cmake file is out of date? The one we have in MAPL and the one we have in ESMA_cmake are identical to the one in ESMF.

Could you some how be picking up another one? It was noted by @danrosen25 in the ESMF PR, that, at the time these CMake files were out-of-date:

And a look at them shows them still quite old.

Perhaps some package is still referring to an old FindESMF.cmake in some way?

I'm running with the MAPL FindESMF.cmake on WCOSS2 right now, but if this is the case, we should look into coordinating a place for one FindESMF.cmake to exist and other groups pull from that location.

climbfuji commented 2 months ago

There's already an issue in the cmakemodules repo that talks about using ESMF's own findESMF.cmake: https://github.com/NOAA-EMC/CMakeModules/issues/70 - there are also issues in fv3-jedi and spack for this if I remember correctly.

junwang-noaa commented 2 months ago

@climbfuji is the findESMF.cmake issue caused by the new ESMF 8.6.1? I am curious why it is not an issue in previous ESMF 8.6.0.

danrosen25 commented 2 months ago

@climbfuji @junwang-noaa See this pull request, which was merged into 8.6.1 https://github.com/esmf-org/esmf/pull/226

When I tests this change in UFS I had trouble IF I updated the UFS FindESMF.cmake files. If I left them alone then the UFS system built.

junwang-noaa commented 2 months ago

@danrosen25 Thanks for looking into this issue. My question is how to resolve the build issue when updating the ESMF8.6.1 and MAPL 2.46.2 in UFS weather model. I don't remember that ufs weather model has these three submodules.

@BrianCurtis-NOAA @bbakernoaa @GeorgeGayno-NOAA FYI. The FindESMF,cmake in the repository Dan listed may need updates for ESMF 8.6.1.

BrianCurtis-NOAA commented 2 months ago

I'll try to find the log, but I updated all FindESMF.cmake in UFSWM and its sub-components to the one in the ESMF repo and CDEPS doesn't like the use ESMF in share/shr_abort_mod.F90 on line 11 with error #7002: Error in opening the compiled module file. Check INCLUDE paths. [ESMF]

climbfuji commented 2 months ago

Note, I am on PTO until beginning of August - not sure how much I can help with this issue. @AlexanderRichert-NOAA also had experience these xxxxx ESMF vs esmf issues.

danrosen25 commented 2 months ago

@BrianCurtis-NOAA I looked through the code: CDEPS overwrites ESMF_F90COMPILEPATHS then uses the variable directly.

The FindESMF.cmake file included with ESMF creates a new variable ESMF_INCLUDE_DIRECTORIES. Or one can use cmake targets ESMF or ESMF::ESMF. This was true before the 8.6.1 release too.

So basically, CDEPS needs to run the current version of FindESMF.cmake included with CDEPS in order for the current code to work OR share/CMakeLists.txt needs to be modified. And CDEPS will not run the FindESMF.cmake version if a target 'esmf' (case sensitive already exists.

I've been making the case for independent component build steps, such as done when using the build infrastructure in ESMX. Setting all CMake variables in one place and letting global CMAKE variables affect builds for all components is hard to manage.

junwang-noaa commented 2 months ago

@danrosen25 may I ask if you have the code updates for CDEPS so that we can move to ESMF 8.6.1? Thanks for looking into this issue.

danrosen25 commented 2 months ago

Does it not work with the existing version of FindESMF.cmake in CDEPS? https://github.com/NOAA-EMC/CDEPS/blob/develop/cmake/FindESMF.cmake

But I think you need to change this line from APPEND to PREPEND to use the correct module. https://github.com/NOAA-EMC/CDEPS/blob/develop/CMakeLists.txt#L31

Either that or change the CMakeLists.txt to https://github.com/NOAA-EMC/CDEPS/blob/develop/share/CMakeLists.txt#L23 to remove ESMF_F90COMPILEPATHS and add target_link_libraries(cdeps_share ESMF::ESMF) along with changing the rest of the CmakeLists.txt files in CDEPS using ESMF_F90COMPILEPATHS and updating the FindESMF.cmake file in CDEPS to match the version provided by ESMF.

NickSzapiro-NOAA commented 2 months ago

It may be confusing that some cmake in CDEPS is for a "standalone" setup. I believe these are the relevant ones for UFS: https://github.com/ufs-community/ufs-weather-model/blob/develop/CDEPS-interface/CMakeLists.txt https://github.com/ufs-community/ufs-weather-model/blob/develop/CDEPS-interface/cdeps_files.cmake and then within each data component CDEPS/dxxx/CMakeLists.txt

danrosen25 commented 2 months ago

Since that's the CMakeLists.txt file being used it's probably a case sensitivity issue. The target for ESMF is ESMF::ESMF and the alias is ESMF. What I'm seeing here is esmf. I'm looking at the current FindESMF file in CMakeModules (note that this is not the one distributed by ESMF) it is lowercase esmf (we've never provided a FindESMF module with lowercase esmf. https://github.com/NOAA-EMC/CMakeModules/blob/cabd7753ae17f7bfcc6dad56daf10868aa51c3f4/Modules/FindESMF.cmake

mathomp4 commented 2 months ago

Note: GEOS is still has a few esmf target refs due to olden days when we were linking to libesmf.a and, well, in CMake-land that is esmf. But of course now we have a real FindESMF.cmake and we should follow that.

But until I can fix up all of GEOS, we have:

    if (NOT TARGET esmf)
      add_library(esmf ALIAS ESMF::ESMF)
    endif ()

in our code to still support the old style. I hope to remove it soon.

danrosen25 commented 2 months ago

Similar code can be added to UFS after this line: https://github.com/ufs-community/ufs-weather-model/blob/develop/CMakeLists.txt#L150

climbfuji commented 1 week ago

Where are we with this issue? Have esmf@8.6.1 and mapl@2.46.2 been installed on all NOAA RDHPCS systems in spack-stack-1.6.0? Or is this moot given that spack-stack-1.8.0 has esmf@8.6.1 with mapl@2.46.3?

junwang-noaa commented 1 week ago

MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3

climbfuji commented 1 week ago

MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3

@RatkoVasic-NOAA FYI

jkbk2004 commented 1 week ago

@mathomp4 we can continue to test on orion and hercules for the new versions of mapl and esmf. we can follow up at https://github.com/ufs-community/ufs-weather-model/issues/2346.

ulmononian commented 1 week ago

MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3

@RatkoVasic-NOAA FYI

@climbfuji @junwang-noaa:

@RatkoVasic-NOAA installed a test env on orion/hercules w/ mapl@2.46.3 and esmf@8.6.1 in the following locations:

Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core
Orion: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core

thank you @RatkoVasic-NOAA!!!