Open junwang-noaa opened 4 months ago
We need to do https://github.com/JCSDA/spack-stack/issues/1157 first, then this.
Below are the instructions and a list of platforms / assigned spack-stack installers:
Go to spack-stack-1.6.0 installation and run the basic steps for building spack-stack environments on this system (see https://spack-stack.readthedocs.io/en/1.6.0/PreConfiguredSites.html)
Make sure git remotes are configured correctly to point to JCSDA for both spack-stack and spack, do a git remote update
, git check out jcsda/release/1.6.0
(replace jcsda
with origin
or however the remote is named); a subsequent git status
should show
-bash-4.2$ git status
#
3. `git submodule update` should check out the correct hash for the `spack` submodule; if not, go to `spack`, do a `git remote update && git checkout jcsda/release/1.6.0`.
3. Back to the spack-stack top-level directory: `source setup.sh`
4. For each unified environment in `envs`, do (please use a name that works for your setup, may include compiler suffix etc):
spack stack create env --name=ue-esmf-8.6.1-mapl-2.46.2 --site=s4 --template=unified-dev \ --upstream=/data/prod/jedi/spack-stack/spack-stack-1.6.0/envs/unified-env/install \ 2>&1 | tee log.create.ue-esmf-8.6.1-mapl-2.46.2.001
6. Update `envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml` and set the correct compiler in the compiler matrix line (match upstream!) and set correct esmf/mapl versions:
sed -i "s/'%aocc', '%apple-clang', '%gcc', '%intel'/'%intel'/g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml sed -i "s/mapl@2.40.3 ^esmf@8.5.0/mapl@=2.46.2 ^esmf@=8.6.1/g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml sed -i "s/- mapl@2.40.3 ^esmf@8.6.0//g" envs/ue-esmf-8.6.1-mapl-2.46.2/spack.yaml spack env activate -p envs/ue-esmf-8.6.1-mapl-2.46.2
7. Concretize: `spack concretize 2>&1 | tee log.concretize.ue-esmf-8.6.1-mapl-2.46.2.001`, check output:
$ cat log.concretize.ue-esmf-8.6.1-mapl-2.46.2.001 | grep -vE '[+]|[e]|[\^]' ==> Concretized crtm@v2.4.1-jedi%intel
==> Concretized crtm@2.4.0.1%intel
==> Concretized ewok-env%intel+cylc+ecflow
==> Concretized fms@release-jcsda%intel
==> Concretized fms@2023.04%intel
==> Concretized global-workflow-env%intel
==> Concretized gmao-swell-env%intel
==> Concretized gsi-env%intel
==> Concretized jedi-fv3-env%intel
==> Concretized jedi-mpas-env%intel
==> Concretized jedi-neptune-env%intel
==> Concretized jedi-ufs-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2
==> Concretized jedi-um-env%intel
==> Concretized madis@4.5%intel
==> Concretized mapl@=2.46.2%intel ^esmf@=8.6.1
==> Concretized soca-env%intel
==> Concretized ufs-srw-app-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2
==> Concretized ufs-weather-model-env%intel ^esmf@=8.6.1 ^mapl@=2.46.2
8. Fix `grib-utils` modulefile for `wgrib` (compare against upstream environment if you are not sure what to do): Replace
< 'WGRIB2': '{prefix}/bin/wgrib2'
with
'WGRIB': '{prefix}/bin/wgrib'
in the `grib-utils` section. I don't know why this was never committed to the release/1.6.0 branch. 9. Install via `spack install --verbose 2>&1 | tee log.install.ue-esmf-8.6.1-mapl-2.46.2.001`, then `spack module lmod refresh --upstream-modules` (use `tcl` instead of `lmod` where necessary), then `spack stack setup-meta-modules`
ONLY TICK IF YOU'VE ALSO FIXED THE GRIB-UTILS MODULE FOR WGRIB
ONLY DO THIS FOR THE BASE UNIFIED-ENV - IGNORE THE ADDON ENVS
@climbfuji @AlexanderRichert-NOAA @jkbk2004 @junwang-noaa i installed a chained env based on 1.6.0 but with esmf/8.6.1 and mapl/2.46.2 here /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core
. it is intel only for now. please give a try and let us know how it works with the ufs-wm.
@climbfuji @AlexanderRichert-NOAA @jkbk2004 @junwang-noaa i installed a chained env based on 1.6.0 but with esmf/8.6.1 and mapl/2.46.2 here
/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/modulefiles/Core
. it is intel only for now. please give a try and let us know how it works with the ufs-wm.
I am doing the gcc
part now. I had to comment out jedi-tools-env
in the chained environment, but that doesn't matter. Fortunately, 1.6.0 as the last release that had more than one compiler in one environment - this just causes trouble. Unfortunately, though, we always need to go back and make updates to 1.6.0!
@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:
==> Ran patch() for mapl
==> mapl: Executing phase: 'cmake'
==> Error: InstallError: Unsupported MPI stack
/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args:
360 elif self.spec.satisfies("^cray-mpich"):
361 args.append(self.define("MPI_STACK", "mpich"))
362 else:
>> 363 raise InstallError("Unsupported MPI stack")
364
365 return args
See build log for details:
Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?
@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:
==> Ran patch() for mapl ==> mapl: Executing phase: 'cmake' ==> Error: InstallError: Unsupported MPI stack /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args: 360 elif self.spec.satisfies("^cray-mpich"): 361 args.append(self.define("MPI_STACK", "mpich")) 362 else: >> 363 raise InstallError("Unsupported MPI stack") 364 365 return args See build log for details:
Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?
https://github.com/JCSDA/spack/pull/449 and https://github.com/JCSDA/spack-stack/pull/1189 fix this for release/1.6.0, https://github.com/spack/spack/pull/45164 for spack develop (it will come back to spack-stack-dev with the next pull).
https://github.com/JCSDA/spack-stack/pull/1189 also fixes the missing grib-utils
module file change for wgrib
.
@mathomp4 mapl 2.46.2 refuses to build on Hercules with gcc, because the 1.6.0 stack uses mvapich2:
==> Ran patch() for mapl ==> mapl: Executing phase: 'cmake' ==> Error: InstallError: Unsupported MPI stack /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/spack/var/spack/repos/builtin/packages/mapl/package.py:363, in cmake_args: 360 elif self.spec.satisfies("^cray-mpich"): 361 args.append(self.define("MPI_STACK", "mpich")) 362 else: >> 363 raise InstallError("Unsupported MPI stack") 364 365 return args See build log for details:
Any quick fix for this (locally if needed - we've moved away from mvapich2 since spack-stack-1.7.0)?
JCSDA/spack#449 and #1189 fix this for release/1.6.0, spack/spack#45164 for spack develop (it will come back to spack-stack-dev with the next pull).
1189 also fixes the missing
grib-utils
module file change forwgrib
.
@ulmononian Hercules is done for gcc, and I also fixed the grib-utils
module and regenerated all module files.
@climbfuji thanks for taking on the hercules gcc
issue. it looks to me like you did the gcc install in /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install
; am i correct? if so, i can let the ufs-wm devs know.
@climbfuji thanks for taking on the hercules
gcc
issue. it looks to me like you did the gcc install in/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install
; am i correct? if so, i can let the ufs-wm devs know.
Correct.
It looks like that the platforms missing are all EMC and EPIC systems - everything else is either done or not needed.
In case it has not been reported here yet, I wanted to make aware this issue seen on Hercules when testing with the esmf/8.6.1 spack-stack 1.6.0 installation. @jkbk2004 @BrianCurtis-NOAA @FernandoAndrade-NOAA
CMake Error at /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.2/install/intel/2021.9.0/mapl-2.46.2-uiwt3at/lib64/cmake/MAPL/MAPL-targets.cmake:73 (set_target_properties): The link interface of target "MAPL_cfio_r4" contains:
ESMF::ESMF
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
That is so odd. I mean, ESMF 8.6.1 and MAPL 2.46 were essentially created to allow for the ESMF::ESMF
target.
Hmm. My only next thought is that a FindESMF.cmake
file is out of date? The one we have in MAPL and the one we have in ESMA_cmake are identical to the one in ESMF.
Could you some how be picking up another one? It was noted by @danrosen25 in the ESMF PR, that, at the time these CMake files were out-of-date:
- NOAA-EMC/AQM-utils cmake/FindESMF.cmake
- ufs-community/UFS_UTILS cmake/FindESMF.cmake
- noaa-oar-arl/NEXUS cmake/FindESMF.cmake
And a look at them shows them still quite old.
Perhaps some package is still referring to an old FindESMF.cmake
in some way?
That is so odd. I mean, ESMF 8.6.1 and MAPL 2.46 were essentially created to allow for the
ESMF::ESMF
target.Hmm. My only next thought is that a
FindESMF.cmake
file is out of date? The one we have in MAPL and the one we have in ESMA_cmake are identical to the one in ESMF.Could you some how be picking up another one? It was noted by @danrosen25 in the ESMF PR, that, at the time these CMake files were out-of-date:
- NOAA-EMC/AQM-utils cmake/FindESMF.cmake
- ufs-community/UFS_UTILS cmake/FindESMF.cmake
- noaa-oar-arl/NEXUS cmake/FindESMF.cmake
And a look at them shows them still quite old.
Perhaps some package is still referring to an old
FindESMF.cmake
in some way?
I'm running with the MAPL FindESMF.cmake on WCOSS2 right now, but if this is the case, we should look into coordinating a place for one FindESMF.cmake to exist and other groups pull from that location.
There's already an issue in the cmakemodules repo that talks about using ESMF's own findESMF.cmake: https://github.com/NOAA-EMC/CMakeModules/issues/70 - there are also issues in fv3-jedi and spack for this if I remember correctly.
@climbfuji is the findESMF.cmake issue caused by the new ESMF 8.6.1? I am curious why it is not an issue in previous ESMF 8.6.0.
@climbfuji @junwang-noaa
See this pull request, which was merged into 8.6.1
https://github.com/esmf-org/esmf/pull/226
When I tests this change in UFS I had trouble IF I updated the UFS FindESMF.cmake files. If I left them alone then the UFS system built.
@danrosen25 Thanks for looking into this issue. My question is how to resolve the build issue when updating the ESMF8.6.1 and MAPL 2.46.2 in UFS weather model. I don't remember that ufs weather model has these three submodules.
@BrianCurtis-NOAA @bbakernoaa @GeorgeGayno-NOAA FYI. The FindESMF,cmake in the repository Dan listed may need updates for ESMF 8.6.1.
I'll try to find the log, but I updated all FindESMF.cmake in UFSWM and its sub-components to the one in the ESMF repo and CDEPS doesn't like the use ESMF
in share/shr_abort_mod.F90 on line 11 with error #7002: Error in opening the compiled module file. Check INCLUDE paths. [ESMF]
Note, I am on PTO until beginning of August - not sure how much I can help with this issue. @AlexanderRichert-NOAA also had experience these xxxxx ESMF vs esmf issues.
@BrianCurtis-NOAA
I looked through the code: CDEPS overwrites ESMF_F90COMPILEPATHS
then uses the variable directly.
The FindESMF.cmake file included with ESMF creates a new variable ESMF_INCLUDE_DIRECTORIES
. Or one can use cmake targets ESMF
or ESMF::ESMF
. This was true before the 8.6.1 release too.
So basically, CDEPS needs to run the current version of FindESMF.cmake included with CDEPS in order for the current code to work OR share/CMakeLists.txt
needs to be modified. And CDEPS will not run the FindESMF.cmake version if a target 'esmf' (case sensitive already exists.
I've been making the case for independent component build steps, such as done when using the build infrastructure in ESMX. Setting all CMake variables in one place and letting global CMAKE variables affect builds for all components is hard to manage.
@danrosen25 may I ask if you have the code updates for CDEPS so that we can move to ESMF 8.6.1? Thanks for looking into this issue.
Does it not work with the existing version of FindESMF.cmake in CDEPS? https://github.com/NOAA-EMC/CDEPS/blob/develop/cmake/FindESMF.cmake
But I think you need to change this line from APPEND
to PREPEND
to use the correct module.
https://github.com/NOAA-EMC/CDEPS/blob/develop/CMakeLists.txt#L31
Either that or change the CMakeLists.txt to
https://github.com/NOAA-EMC/CDEPS/blob/develop/share/CMakeLists.txt#L23
to remove ESMF_F90COMPILEPATHS
and add target_link_libraries(cdeps_share ESMF::ESMF)
along with changing the rest of the CmakeLists.txt files in CDEPS using ESMF_F90COMPILEPATHS
and updating the FindESMF.cmake file in CDEPS to match the version provided by ESMF.
It may be confusing that some cmake in CDEPS is for a "standalone" setup. I believe these are the relevant ones for UFS: https://github.com/ufs-community/ufs-weather-model/blob/develop/CDEPS-interface/CMakeLists.txt https://github.com/ufs-community/ufs-weather-model/blob/develop/CDEPS-interface/cdeps_files.cmake and then within each data component CDEPS/dxxx/CMakeLists.txt
Since that's the CMakeLists.txt file being used it's probably a case sensitivity issue. The target for ESMF is ESMF::ESMF
and the alias is ESMF
. What I'm seeing here is esmf
. I'm looking at the current FindESMF file in CMakeModules (note that this is not the one distributed by ESMF) it is lowercase esmf
(we've never provided a FindESMF module with lowercase esmf
.
https://github.com/NOAA-EMC/CMakeModules/blob/cabd7753ae17f7bfcc6dad56daf10868aa51c3f4/Modules/FindESMF.cmake
Note: GEOS is still has a few esmf
target refs due to olden days when we were linking to libesmf.a
and, well, in CMake-land that is esmf
. But of course now we have a real FindESMF.cmake
and we should follow that.
But until I can fix up all of GEOS, we have:
if (NOT TARGET esmf)
add_library(esmf ALIAS ESMF::ESMF)
endif ()
in our code to still support the old style. I hope to remove it soon.
Similar code can be added to UFS after this line: https://github.com/ufs-community/ufs-weather-model/blob/develop/CMakeLists.txt#L150
Where are we with this issue? Have esmf@8.6.1 and mapl@2.46.2 been installed on all NOAA RDHPCS systems in spack-stack-1.6.0? Or is this moot given that spack-stack-1.8.0 has esmf@8.6.1 with mapl@2.46.3?
MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3
MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3
@RatkoVasic-NOAA FYI
@mathomp4 we can continue to test on orion and hercules for the new versions of mapl and esmf. we can follow up at https://github.com/ufs-community/ufs-weather-model/issues/2346.
MAPL 2.46.2 has a bug, we have to move to esmf 8.6.1 and mapl 2.46.3 to debug the issue. We suggest having a test version of spack-stack 1.6.0 with esmf 8.6.1 and mapl 2.46.3 to continue the debugging work, while you can move forward with spack-stack 1.8.0 release with esmf 8.6.1 and mapl 2.46.3
@RatkoVasic-NOAA FYI
@climbfuji @junwang-noaa:
@RatkoVasic-NOAA installed a test env on orion/hercules w/ mapl@2.46.3 and esmf@8.6.1 in the following locations:
Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core
Orion: /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/ue-esmf-8.6.1-mapl-2.46.3/install/modulefiles/Core
thank you @RatkoVasic-NOAA!!!
Per a telecon today between the MAPL team (cc @tclune) and UFS team (e.g., @junwang-noaa and others), there was a request to create a MAPL tag that was based on MAPL v2.40.3 (which was the last "working" version) but with support for ESMF 8.6.1 as that version was needed. This could then be installed on Hercules for testing.
I've created a "preliminary" tag, v2.40.3.1 where the changes compared to v2.40.3 are:
esmf
target references in CMake are now ESMF::ESMF
FindESMF.cmake
to the ESMF 8.6.1 versionESMF_ConfigNextLine
call in get_vec_from_config
to use tableEnd
per @danrosen25The tag is on MAPL now, though no release as I'm not sure yet if all the needed CMake, etc. changes have been brought over (MAPL 2.40 was a while ago).
Now, MAPL 2.40.3.1 is not in the spack package.py
for MAPL, so I'm guessing it'll need to be installed as:
spack install mapl@git.v2.40.3.1
My laptop seems to be able to resolve that. That said, we might need to iterate on this a few times if I missed something. There have been further "fixes for Spack/UFS" on later tags and perhaps those might need backporting. If so, I can update and push the tag.
Hi @mathomp4
The 8.6.1 compliant version should fix the the method for looping over ESMF_Config tables. It's erroneous to call ESMF_ConfigNextLine when you're at the end of the table because there aren't more items. Previously you could call ESMF_ConfigNextLine at the end of the table and it would return the end of table marker ::
. Here's my exchange with Ben.
reproducer_490.tgz
I removed a statically sized string buffer that held data for the "current line" of a configuration file. This work was done to eliminate the fixed maximum line length of 1024. Paired with the work and out of necessity, I also cleaned up all the calls that returned the next line. I'm not sure why the code above doesn't use tableEnd in the call to ESMF_ConfigNextLine? https://earthsystemmodeling.org/docs/release/latest/ESMF_refdoc/node6.html#SECTION060931800000000000000
Alternatively the code could use ESMF_ConfigGetDim to get the line count of a table and use this in a do loop. https://earthsystemmodeling.org/docs/release/latest/ESMF_refdoc/node6.html#SECTION060931300000000000000
Technically "::" isn't the next line, similar to how the label itself is not the next line. I can discuss this further with the ESMF Core team if this needs to be added back but my recommendation is to use the tableEnd argument in ESMF_ConfigNextLine.
I've attached a reproducer of this issue that includes the two working examples mentioned above.
@danrosen25 Ohhhh. Okay. Let me consult with @bena-nasa on that. I might need to push the tag...
ETA: I talked with @bena-nasa about this and we found one table call that needed updating. I've pushed v2.40.3.1 with the update.
@mathomp4 We will test ESMF 8.6.1 with this MAPL 2.40.3.1 in ufs-weather model then.
@AlexanderRichert-NOAA would you please install spack-stack with these new libraries on Hercules. The current Hercules module file is at: https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_hercules.intel.lua. Thanks
@mathomp4 the 2.40.3.1 build on hercules is failing because c_ptr & c_loc are undefined in geom/FieldPointerUtilities.F90. Adding use iso_c_binding
at the top of the file fixes it, in which case it builds okay.
@junwang-noaa if/when @mathomp4 updates the tag to fix the issue in my previous comment, I'll reinstall, but if you want to go ahead and test: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/mapl-2.40.3.1-intel-2021.9.0/install/modulefiles/Core
@mathomp4 the 2.40.3.1 build on hercules is failing because c_ptr & c_loc are undefined in geom/FieldPointerUtilities.F90. Adding
use iso_c_binding
at the top of the file fixes it, in which case it builds okay.
Ahhh. Yeah. That was a file where we could compile it because the iso_c_binding
was bleeding in via ESMF, but they fixed that on their end. It was for 8.7 but I guess it got into 8.6.1. Good find @AlexanderRichert-NOAA
I've pushed the v2.40.3.1 tag.
Thanks @mathomp4. I just reinstalled using the updated tag.
@AlexanderRichert-NOAA may I ask if you can install ESMF beta snapshot 8.8.0b04(https://github.com/esmf-org/esmf/releases/tag/v8.8.0b04) with MAPL 2.40.3.1 on Hercules for people to test the grid imprint issue in UFS coupled test? Thanks
Yes, will do
/work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/esmf-8.8.0b04-intel-2021.9.0/install/modulefiles/Core
@AlexanderRichert-NOAA et al, I've released a formal MAPL v2.40.3.1 release[^1]:
https://github.com/GEOS-ESM/MAPL/releases/tag/v2.40.3.1
and I've made a PR to spack mainline for it:
https://github.com/spack/spack/pull/47627
If any changes are needed now, we'll up the tweak number to 2.
[^1]: ~The release doesn't have a Zenodo badge yet because, well, it doesn't seem to be appearing on Zenodo. Not sure why 🤷🏼. I'll keep monitoring.~ Never mind. It appeared!
Package name
ESMF and MAPL
Package version/tag
ESMF/8.6.1 and MAPL/2.46.2
Build options
Current
Installation timeframe
The two libraries will be installed under current spack-stack 1.6.0.
Other information
No response
WCOSS2
WCOSS2: General questions
No response
WCOSS2: Installation and testing
No response
WCOSS2: Technical & security review list
WCOSS2: Additional comments
No response