NOAA-EMC / GSI

Gridpoint Statistical Interpolation
GNU Lesser General Public License v3.0
66 stars 147 forks source link

Port RadMon to wcoss2 #226

Closed EdwardSafford-NOAA closed 2 years ago

EdwardSafford-NOAA commented 2 years ago

Port RadMon DA package to wcoss2.

EdwardSafford-NOAA commented 2 years ago

Having build issues, which isn't unexpected on a new platform, but is still plenty frustrating.

Issues with RadMon build, using build_RadMon_cmake.sh:

CMake Error at /lfs/h2/emc/da/noscrub/Edward.Safford/GSI/cmake/Modules/FindNetCDF.cmake:210 (message): Unable to properly find NetCDF. Found static libraries at: /lfs/h2/emc/da/noscrub/Edward.Safford/GSI/util/Radiance_Monitor/NetCDF_Fortran_LIBRARY-NOTFOUND but could not run nc-config: Call Stack (most recent call first): CMakeLists.txt:83 (find_package)

CMake Warning (dev) at CMakeLists.txt:83 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Environment variable NetCDF_ROOT is set to: /apps/prod/hpc-stack/intel-19.1.3.304/cray-mpich-8.1.4/netcdf/4.7.4 For compatibility, CMake is ignoring the variable. This warning is for project developers. Use -Wno-dev to suppress it.

So then I tried to figure out how the top level build script works and what I might be doing wrong. There I ran into these issues with GSI/ush/build_all_cmake.sh:

find: ‘/nemsio/v/intel’: No such file or directory find: ‘/sorc’: No such file or directory CMake Error at cmake/Modules/findHelpers.cmake:137 (string): string sub-command REGEX, mode MATCH needs at least 5 arguments total to command. Call Stack (most recent call first): cmake/Modules/FindNEMSIO.cmake:19 (findInc) CMakeLists.txt:225 (find_package)

source /lfs/h2/emc/da/noscrub/Edward.Safford/GSI/modulefiles/modulefile.ProdGSI.wcoss2 ++ proc ModulesHelp '{' '}' '{' /lfs/h2/emc/da/noscrub/Edward.Safford/GSI/modulefiles/modulefile.ProdGSI.wcoss2: line 4: proc: command not found

That suggests an error in the modulefile.ProdGSI.wcoss2 file but that idea is contradicted by the RadMon's build_all_cmake.sh script which doesn't choke on it. So lots of mysteries.

RussTreadon-NOAA commented 2 years ago

@EdwardSafford-NOAA , I don't see any wcoss2 references in /lfs/h2/emc/da/noscrub/Edward.Safford/GSI/cmake. I only see references to acorn.

I clone Mike's feature/acorn_debug_cleanup and made the following local modifications

While it is good to get the master to build on wcoss2, NCO wants us to port what is currently running in operations.

EdwardSafford-NOAA commented 2 years ago

@RussTreadon-NOAA Just curious -- when you built the RadMon did you get a warning message about NetCDF_ROOT (as I did above)?

I think we probably will need a wcoss2 section in RadMon_config but not for building the executables. I'll add to it on an as-needed basis.

EdwardSafford-NOAA commented 2 years ago

@RussTreadon-NOAA Ignore my previous question -- you wouldn't have seen that because you loaded the the wcoss2 module.

RussTreadon-NOAA commented 2 years ago

Good question. I reran ./build_RadMon_cmake.sh and checked. Yes, I see the same NetCDF_ROOT error as you. I don't see this message when I execute ush/build_all_cmake.sh. The top level CMakeLists.txt has

  cmake_policy(SET CMP0009 NEW)
  cmake_policy(SET CMP0054 NEW)
  cmake_policy(SET CMP0074 NEW)
  find_package(OpenMP)

The Radiance_Monitor CMakeLists.txt has

  cmake_policy(SET CMP0009 NEW)
  find_package(OpenMP)

I added

  cmake_policy(SET CMP0074 NEW)

to the Radiance_Monitor CMakeLists.txt and the cmake warning went away.

EdwardSafford-NOAA commented 2 years ago

I have the RadMon build working now. Somehow my local copy of modulefiles/modulefile.ProdGSI.wcoss2 contained these two lines:

module load hdf5-hdf5_parallel/1.10.6 module load netcdf-hdf5_parallel/4.7.4

I can't remember now where/how I got that module file. But if I change those two lines to:

module load hdf5/1.10.6 module load netcdf/4.7.4

then the build works.

EdwardSafford-NOAA commented 2 years ago

@MichaelLueken-NOAA I have this issue ready for a PR now but have a question. In order to build the RadMon executables I grabbed the GSI/modulefiles/modulefile/ProdGSI.wcoss2 from your feature/acorn_debug_cleanup branch. I modified it slightly to eliminate an error message, but it did work out of the box. Question is should I include that modulefile in this PR (it isn't in GSI/master) or do you intend to merge acorn_debug_cleanup ahead of this PR? Thanks.

MichaelLueken commented 2 years ago

@EdwardSafford-NOAA None of the WCOSS2 port work should be merged to the authoritative master at this time. I will create a new wcoss2 release branch off of release/gfsda.v16.1.5 and we will apply all changes to this release branch, tag it once everything is worked out, then merge the release branch to the master.

EdwardSafford-NOAA commented 2 years ago

@MichaelLueken-NOAA Ok, that makes sense. I'll hold on a PR for this issue until you create that release branch.

MichaelLueken commented 2 years ago

@EdwardSafford-NOAA I have created the WCOSS2 port work branch in the authoritative repo:

https://github.com/NOAA-EMC/GSI/tree/feature/gfsda.v16.1.5_wcoss2_port

This branch is what Russ and I have been working on to port the release/gfsda.v16.1.5 to WCOSS2 (updated to include Haixia's latest update from September 28).

Please make sure that your changes work within this branch, then create a PR to add your WCOSS2 port to this branch.

If you have any questions or comments, please let me know.

EdwardSafford-NOAA commented 2 years ago

@MichaelLueken-NOAA My RadMon port started from the current GSI/master which includes recent changes to add S4 and Jet ports. The feature/gfsda.v16.1.5_wcoss2_port branch does not have these changes. Should I strip those changes out of my port?

MichaelLueken commented 2 years ago

@EdwardSafford-NOAA The WCOSS2 operational port needs to be identical to what is currently in operations on WCOSS, so yes, you will need to strip out all changes that aren't required for WCOSS2 porting. Alternatively, you can keep the work you have already done, create new branches off of feature/gfsda.v16.1.5_wcoss2_port, and apply your non-source code changes to these branches. Whichever method is easier for you.

MichaelLueken commented 2 years ago

Merged this work to feature/gfsda.v16.1.5_wcoss2_port at f5f78c8. Closing issue.