NOAA-EMC / hpc-stack

Create a software stack for HPC's
GNU Lesser General Public License v2.1
29 stars 35 forks source link

[INSTALL] Update CRTM-2.4.0 on all HPC machines #519

Open emilyhcliu opened 1 year ago

emilyhcliu commented 1 year ago

The crtm version 2.4.0 installed under hpc-stack: /apps/contrib/NCEP/libs/hpc-stack/modulefiles/stack is outdated and needs an update.

**Issue #517 is related to this issue.

Which software in the stack would you like installed? crtm version 2.4.0 and related coefficient files

What is the version/tag of the software? release/REL-2.4.0_emc

What compilation options would you like set? intel-2018.4

Which machines would you like to have the software installed? All HPC machines other than HERA HERA already updated.

Any other relevant information that we should know to correctly install the software??

Additional context Question: For ORION, the hpc-stack is the one under active maintenance, the hpc-stack-gfsv16 is not, correct?

jkbk2004 commented 1 year ago

@emilyhcliu EPIC has been supporting hpc stack on Orion at /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2. Regarding hpc-stack-gfsv16, there can be permission issue (EPIC side) to access /apps/contrib/NCEP/libs/hpc-stack/modulefiles/stack. By any chance, is migration of hpc-stack-gfsv16 possible to somewhere EPIC location ? @natalie-perlin FYI

emilyhcliu commented 1 year ago

@jkbk2004 GSI is having issue at run time when it is compiled with intel-2022. The issue is tracked in https://github.com/NOAA-EMC/GSI/issues/447. So, we can not use the libraries under intel-2022.

For GSI develop, we would like to move to EPIC HPC stacks (hopefully, we can resolve the issue with intel-2022) For GSI release/gfsda.v16 (currently used operational systems), we would like to stay with the hpc-stacks.

So, I think there are two options: (1) make intel-2018 also available under EPIC HPC (2) update the hpc-stack

Any thoughts?

jkbk2004 commented 1 year ago

@natalie-perlin can we add the hpc stack option on orion epic space for this gsi requirement?

natalie-perlin commented 1 year ago

@jkbk2004 @emilyhcliu - The intel-18 modules for the GSI may only be helpful as a debugging step, until the issue with higher-version intel compilers is solved. This however may not be a community-recommended approach of using different compilers to build different parts of the UFS Apps...

DavidHuber-NOAA commented 1 year ago

@natalie-perlin The GSI cannot run with intel 2021+ on any system until the above mentioned issue is resolved. I think everyone agrees that it would be ideal for everything to move to Intel 2022, but, unfortunately, this is not possible for the GSI yet. So all GSI dependencies, including CRTM, need to be compiled with Intel 18 for the time being on all systems.

natalie-perlin commented 1 year ago

@DavidHuber-NOAA - what about the cases with GNU compilers? EPIC supports software stacks with gnu compilers on Hera and Cheyenne that are built to support UFS-WM and UFS-SRW

DavidHuber-NOAA commented 1 year ago

@natalie-perlin I believe these would be required as well, though I can't say with certainty. I've only been helping with the Intel 2022 issue and am not an authority on the GSI otherwise.

natalie-perlin commented 1 year ago

The stack for GSI modules built with intel-2018.4 + impi/2018.4 compilers is ready on Orion. Modules that are listed in gsi_common.lua and gsi_orion.lua are built.

The way to load:

module use /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2018.4/modulefiles/stack
module load hpc/1.2.0

The lines 4-6 in gsi_orion.lua would then become:

prepend_path("MODULEPATH", "/work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2018.4/modulefiles/stack")

local hpc_ver=os.getenv("hpc_ver") or "1.2.0"

Update: Alternatives built: w3emc/2.9.1, w3emc/2.9.2

The identical stack is being built with intel-2022.1.2 compiler, which hopefully could be used for debugging purposes for the > intel/2020 compilers. (Fingers crossed)

Please let us know if you have any comments on the modules built or needed to be built.

natalie-perlin commented 1 year ago

HPC-stack with intel/2022.1.2 compiler on Orion:

module use /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2_gsi/modulefiles/stack
module load hpc/1.2.0
natalie-perlin commented 1 year ago

@jkbk2004 - The crtm/2.4.0 fix files have been updated on all the NOAA RDHPC systems. The updated CRTM-2.4.0 code that does not have the excessive printout statements as mentioned in GSI Issue-556 has only been solved in the newer EPIC-maintained hpc-stacks that are based off netcdf-4.9.2. EPIC's set of stacks with netcdf-4.7.4 still use the library version built with excessive printouts. I'd like to update crtm/2.4.0 in these current stacks, as this was raised as an issue by the GSI team. When is the best time to do the update to avoid disruption to any RT testings (weekend, early mornings, after the PR-1745)? WM may move the the updated netcdf-4.9.2 -based stacks, as in https://github.com/ufs-community/ufs-weather-model/pull/1745 that are free of excessive printout. But other repositories, such as GSI, global_workflow, UFS_UTILS, SRW, etc, may still be using older, netcdf-4.7.4-based stack builds for some time.

natalie-perlin commented 1 year ago

@emilyhcliu @jkbk2004 - I will plan to fully update the CRTM-2.4.0 code with the new code that contains a bug-fix in all EPIC-maintained hpc-stacks that are built with netcdf/4.7.4 over the weekend, when it is unlikely to interfere with the WM and SRW tests. So far, the update has been done to the newer stacks built with netcdf/4.9.2.

A stack with theintel/2018.4 on Orion has just have been built recently (May 2, 2023), and it uses the recent code with the CRTM-2.4.0.: /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2018.4 The same is true for the limited-library stack build for the GSI team on Orion, with intel/2022.1.2 compiler, in /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2_gsi

The crtm/2.4.0 stack update would require rebuilding a upp library as well, as a dependency on crtm. Will notify here when done.

natalie-perlin commented 1 year ago

All the active and current EPIC stacks have been updated with the latest CRTM/2.4.0 and corresponding CRTM fix files. Please see below the stack locations: Hera intel/2022.1.2: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2, /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2_ncdf492

Hera gnu/9.2.0: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2, /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/gnu-9.2_ncdf492

Orion intel/2022.1.2: /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2, /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2022.1.2_ncdf492/

Orion intel/2018.4: /work/noaa/epic-ps/role-epic-ps/hpc-stack/libs/intel-2018.4

Jet intel/2022.1.2: /mnt/lfs4/HFIP/hfv3gfs/role.epic/hpc-stack/libs/intel-2022.1.2, /mnt/lfs4/HFIP/hfv3gfs/role.epic/hpc-stack/libs/intel-2022.1.2_ncdf492

Jet intel/2018:/mnt/lfs4/HFIP/hfv3gfs/role.epic/hpc-stack/libs/intel-18.0.5.274

Cheyenne intel/2022.1: /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1, /glade/work/epicufsrt/contrib/hpc-stack/src-intel2022.1_ncdf492

Cheyenne gnu/10.1.0: /glade/work/epicufsrt/contrib/hpc-stack/gnu10.1.0, /glade/work/epicufsrt/contrib/hpc-stack//gnu10.1.0_ncdf49

Gaea intel-classic/2022.2.1: /lustre/f2/dev/role.epic/contrib/hpc-stack/intel-classic-2022.2.1

BijuThomas-NOAA commented 1 year ago

@DavidHuber-NOAA and @emilyhcliu Wondering if anybody tries to run GSI with spack-stack (stack-intel/2021.7.1) on Hercules. I have issues during running while it compiled successfully on Hercules.

DavidHuber-NOAA commented 1 year ago

@BijuThomas-NOAA No, I have not tried yet. The GSI does not yet run with Intel 2021+ (NOAA-EMC/GSI#447 NOAA-EMC/GSI#571), but I have it working on Orion and am actively working on it on Hera with an apparent communication problem on Hera. @natalie-perlin has gotten it to work on Gaea and is actively working on Cheyenne. After that, we could perhaps try out spack stack and then Hercules.