NOAA-EMC / hpc-stack

Create a software stack for HPC's
GNU Lesser General Public License v2.1
30 stars 36 forks source link

[INSTALL] Library modules to support GFSv16.2.0 on Hera/Orion #379

Open KateFriedman-NOAA opened 2 years ago

KateFriedman-NOAA commented 2 years ago

In order to support the new GFSv16.2.0 (WCOSS2 port version) on Hera and Orion we need the same library module versions available. Below I list the versions that are currently being used in the new operational GFSv16.2.0 and which ones are missing on Hera/Orion.

Which software (and version) in the stack would you like installed?

Hera & Orion:

Which machines would you like to have the software installed?

Hera, Orion

Additional context

Here are the build.ver module versions for GFSv16.2.0: https://github.com/NOAA-EMC/global-workflow/blob/feature/ops-wcoss2/versions/build.ver

Refs: https://github.com/NOAA-EMC/global-workflow/issues/639

arunchawla-NOAA commented 2 years ago

No WAFS on developer runs on non-wcoss2 platforms specially since these have operations based product requirements

kgerheiser commented 2 years ago

Thanks everyone for their input. I will finish up #392 and related PRs by using a different version of NCO and adding the module variables. Then, we can get it rolled out.

YaliMao-NOAA commented 2 years ago

@arunchawla-NOAA @KateFriedman-NOAA Thank you for the confirmation. In case I need to run WAFS on a canned data set on Hera or Orion, can I load non-hpc-stack modules, such as bufr_dump and util_shared, along with hpc-stack modules to make it work?

kgerheiser commented 2 years ago

@YaliMao-NOAA yes, you can do that.

YaliMao-NOAA commented 2 years ago

@YaliMao-NOAA yes, you can do that.

@kgerheiser Thank you for the confirmation!

arunchawla-NOAA commented 2 years ago

@kgerheiser and @Hang-Lei-NOAA we use esmf 8.1.0 and esmf 8.0.1 for operations. In hpc-stack we removed these libraries in favour of the newer installations. However, we need these to support operations. @junwang-noaa and @DusanJovic-NOAA are we missing any other modules needed for the model ?

kgerheiser commented 2 years ago

@KateFriedman-NOAA @WenMeng-NOAA I have a test install on Orion with all the requested changes. Could you take a look at it before I install it in its permanent home?

_LIB module variables, NCO 5.0.6, and the versions you asked for.

module use /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/stack
module load hpc-intel
module load hpc-impi
  atlas/ecmwf-0.24.1    esmf/8_0_1        (D)    fms/2021.03        madis/4.3                ncio/1.0.0      nemsiogfs/2.5.3        pio/2.5.3     wrf_io/1.1.1
   eckit/ecmwf-1.16.0    fckit/ecmwf-0.9.2        hdf5/1.10.6 (D)    mapl/2.7.3-esmf-8_0_1    nemsio/2.5.2    netcdf/4.7.4    (D)    upp/10.0.8

------------------------------------------------------------- /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/compiler/intel/2018.4 --------------------------------------------------------------
   bacio/2.4.1        g2/3.4.5        gftl-shared/v1.3.0        ip/3.3.3             landsfcutil/2.4.1        netcdf/4.7.4       sp/2.3.3              w3nco/2.4.1
   bufr/11.4.0        g2c/1.6.4       grib_util/1.2.3           ip2/1.1.2            libpng/1.6.37     (L)    prod_util/1.2.2    szip/2.1.1     (D)    wgrib2/2.0.7   (D)
   cdo/1.9.8   (D)    g2tmpl/1.9.1    hdf5/1.10.6               jasper/2.0.25 (D)    nccmp/1.8.9.0     (D)    sfcio/1.4.1        udunits/2.2.28 (D)    yafyaml/v0.5.1
   crtm/2.3.0         gfsio/1.4.1     hpc-impi/2018.4    (L)    jpeg/9.1.0           nco/5.0.6         (D)    sigio/2.3.2        w3emc/2.9.2           zlib/1.2.11    (D)
KateFriedman-NOAA commented 2 years ago

@kgerheiser Thanks for this test install! @WenMeng-NOAA @MichaelLueken-NOAA @junwang-noaa @DusanJovic-NOAA @YaliMao-NOAA @GeorgeGayno-NOAA @HelinWei-NOAA @malloryprow Please look at Kyle's test install on Orion and let him know if anything is missing/incorrect for your respective component codes. Thanks!

@kgerheiser I loaded the following from your test location with the indicated module use path:

module use /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2018.4
module load hpc-impi/2018.4

...and then tried building the various codes that global-workflow owns. I only ran into one issue:

The lib64 folder for the bufr/11.4.0 module is missing and thus my build script for gfs_bufr exec can't find $BUFR_LIB4 (../bufr/11.4.0/lib64/libbufr_4.a).

These paths from the module:

setenv("BUFR_LIB4","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_4.a")
setenv("BUFR_LIB8","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_8.a")
setenv("BUFR_LIBd","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_d.a")

...aren't there:

$ ll /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/
ls: cannot access /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/: No such file or directory
kgerheiser commented 2 years ago

That's a good catch. We don't normally test with Bufr 11.4.0 and it uses lib instead of lib64.

junwang-noaa commented 2 years ago

@kgerheiser We are requesting upp/8.1.0, but I saw upp/10.0.8 in the list. Can you install upp/8.1.0?

kgerheiser commented 2 years ago

Are there build instructions for how to build UPP with make? The hpc-stack build is based on using CMake, so it will take some re-configuring to support the Makefile build.

WenMeng-NOAA commented 2 years ago

I was able to build UPP with Kyle's testing version of hpc-stack on Orion. @kgerheiser @junwang-noaa I have been waiting for the official hpc-stack 1.2.0 installed on Hera and Orion, then provide new upp tag for installing upp/8.1.1 on Hera and Orion. The upp/8.1.* use GNU makefile.

WenMeng-NOAA commented 2 years ago

The instruction of building upp/8.1.0 on wcoss2 can be found at https://docs.google.com/document/d/19_ymg7UOfr0MNQPAPwP-JBeSnESOTGzSnkjCnvPJ9bg/edit It would be the similar procedure for Hera and Orion.

GeorgeGayno-NOAA commented 2 years ago

@kgerheiser Thanks for this test install! @WenMeng-NOAA @MichaelLueken-NOAA @junwang-noaa @DusanJovic-NOAA @YaliMao-NOAA @GeorgeGayno-NOAA @HelinWei-NOAA @malloryprow Please look at Kyle's test install on Orion and let him know if anything is missing/incorrect for your respective component codes. Thanks!

@kgerheiser I loaded the following from your test location with the indicated module use path:

module use /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2018.4
module load hpc-impi/2018.4

...and then tried building the various codes that global-workflow owns. I only ran into one issue:

The lib64 folder for the bufr/11.4.0 module is missing and thus my build script for gfs_bufr exec can't find $BUFR_LIB4 (../bufr/11.4.0/lib64/libbufr_4.a).

These paths from the module:

setenv("BUFR_LIB4","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_4.a")
setenv("BUFR_LIB8","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_8.a")
setenv("BUFR_LIBd","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/libbufr_d.a")

...aren't there:

$ ll /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/
ls: cannot access /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib64/: No such file or directory

@kgerheiser installation on Orion works with UFS_UTILS.

kgerheiser commented 2 years ago

The bufr issue has been fixed

kgerheiser commented 2 years ago

@WenMeng-NOAA I don't have access to that doc.

WenMeng-NOAA commented 2 years ago

@kgerheiser Add you access.

KateFriedman-NOAA commented 2 years ago

The bufr issue has been fixed

Thanks @kgerheiser! I see this now in that module:

setenv("BUFR_LIB4","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/bufr/11.4.0/lib/libbufr_4.a")

...and my associated gfs_bufr/tocsbufr codes now build.

MichaelLueken commented 2 years ago

@KateFriedman-NOAA @kgerheiser

@RussTreadon-NOAA was able to test the hpc/1.2.0 test build on Orion using gfsda.v16.2.0.1 and all 28 DA components successfully built without any problems.

HelinWei-NOAA commented 2 years ago

GLDAS was built successfully with hpc/1.2.0 test build on Orion

kgerheiser commented 2 years ago

I have been able to successfully build UPP 8.1.0.

kgerheiser commented 2 years ago

I'm ready to install this.

Should this stack exist in its own space? Like a module use .../hpc-stack/gfsv16/modulefiles/stack, or should it be added to the existing library locations?

WenMeng-NOAA commented 2 years ago

module use .../hpc-stack/gfsv16/modulefiles/stack would be better for UPP supporting features for GFS V16 only, e.g. wgrib2/2.0.7.

KateFriedman-NOAA commented 2 years ago

I'm ready to install this.

Awesome, thanks @kgerheiser !

Should this stack exist in its own space? Like a module use .../hpc-stack/gfsv16/modulefiles/stack, or should it be added to the existing library locations?

My vote would be for the latter (existing library locations). While this was set up from a need to support GFSv16 it doesn't have to be limited to that model. Will let others weigh in, I'm ok with the consensus decision.

DavidHuber-NOAA commented 2 years ago

If this is installed in an existing location, will there be an issue with selecting the prerequisite libraries for each install? For example, the UPP install loads the g2tmpl module. This install is asking for g2tmpl/1.9.1 but both the 1.9.1 and 1.10.0 g2tmpl modules are available, so 1.10.0 would be loaded for the install, correct?

kgerheiser commented 2 years ago

Good point @DavidHuber-NOAA. Yes, that would be an issue if you don't carefully order your module loads.

WenMeng-NOAA commented 2 years ago

If this is installed in an existing location, will there be an issue with selecting the prerequisite libraries for each install? For example, the UPP install loads the g2tmpl module. This install is asking for g2tmpl/1.9.1 but both the 1.9.1 and 1.10.0 g2tmpl modules are available, so 1.10.0 would be loaded for the install, correct?

That's my concern. g2tmp/1.10.0 aims for GFSV17 implementation not GFS V16.

DavidHuber-NOAA commented 2 years ago

@kgerheiser While it is true that you need to be careful about load order when using the hpc-stack modules, what I meant was that the hpc-stack install scripts also load the default modules.

Taking the upp/g2tmpl example, see the dependency loads for the UPP in build_nceplibs.sh. Since g2tmpl is loaded without a version number specified, the default (in this case, 1.10.0) would be loaded. Thus, the UPP would be built with g2tmpl/1.10.0 instead of 1.9.1. Thus, you will not be building the UPP on the expected libraries. There are probably other examples, this is just the first I found.

kgerheiser commented 2 years ago

Yes, it would build using the default modules. However, the default is updated whenever a new package is installed so if a new g2tmpl is installed that would become the default and UPP would build with 1.9.1, for example.

DavidHuber-NOAA commented 2 years ago

Ah, that makes sense. Thanks for clarifying.

junwang-noaa commented 2 years ago

@kgerheiser I am building the GFS.v16 using the module files you listed on orion, I got the error message:

ifort: error #10236: File not found: '/opt/modules/intel-2018.4/crtm/2.3.0/lib/libcrtm.a'

The module file is: /work/noaa/nems/junwang/gfs_v16/20220223/ufs-weather-model/modulefiles/orion.intel/fv3.lua with crtm loaded as:

crtm_ver=os.getenv("crtm_ver") or "2.3.0" load(pathJoin("crtm", crtm_ver))

kgerheiser commented 2 years ago

@junwang-noaa looks like you're picking up the wrong crtm module at /opt/modules.

Manually running

module use /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/stack
module load hpc
module load hpc-intel
module show crtm/2.3.0
----------------------------------------------------------------------------------------------------------------------------------------------------
   /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/modulefiles/compiler/intel/2018.4/crtm/2.3.0.lua:
----------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
conflict("crtm")
setenv("crtm_ROOT","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/crtm/2.3.0")
setenv("crtm_VERSION","2.3.0")
setenv("CRTM_INC","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/crtm/2.3.0/include")
setenv("CRTM_LIB","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/crtm/2.3.0/lib/libcrtm.a")
setenv("CRTM_FIX","/work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/crtm/2.3.0/fix")
whatis("Name: crtm")
whatis("Version: 2.3.0")
whatis("Category: library")
whatis("Description: crtm library")
kgerheiser commented 2 years ago

I think installing it in its own location might be the move, just to avoid mixing up libraries and versions, and installing otherwise outdated packages (i.e. wgrib2 2.0.7, g2tmpl, etc). Will do that on Orion today now that the related PR has been merged.

KateFriedman-NOAA commented 2 years ago

I think installing it in its own location might be the move, just to avoid mixing up libraries and versions, and installing otherwise outdated packages (i.e. wgrib2 2.0.7, g2tmpl, etc).

@kgerheiser Based on others replies/concerns I agree. Will you be doing the same separate install on Hera? Thanks!

kgerheiser commented 2 years ago

Yes, but I believe it's down for maintenance today.

junwang-noaa commented 2 years ago

@kgerheiser I still got some error on orion with the new libpng/1.6.37 lib: /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/libpng/1.6.37/lib64/libpng.a(pngrutil.c.o): In function png_handle_iCCP': pngrutil.c:(.text+0xf46): undefined reference toinflateValidate' /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/libpng/1.6.37/lib64/libpng.a(pngrutil.c.o): In function png_decompress_chunk': pngrutil.c:(.text+0x36b9): undefined reference toinflateValidate' /work/noaa/stmp/gkyle/hpc-stack-gfsv16/install/intel-2018.4/libpng/1.6.37/lib64/libpng.a(pngrutil.c.o): In function png_read_start_row': pngrutil.c:(.text+0x6a05): undefined reference toinflateValidate'

kgerheiser commented 2 years ago

So, that's an error on linking to zlib. Can you point me towards what you're building, and the full error message including the link line (presumably a couple lines above your error).

junwang-noaa commented 2 years ago

Thanks for the suggestion. After setting ZLIB_LIB, instead of Z_LIB, the issue is resolved. Thanks

Hang-Lei-NOAA commented 2 years ago

The installation based on hera new compiler has been added on

/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/stack

Please give it a try.

On Wed, Mar 2, 2022 at 11:22 AM Jun Wang @.***> wrote:

Thanks for the suggestion. After setting ZLIB_LIB, instead of Z_LIB, the issue is resolved. Thanks

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/379#issuecomment-1057113360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCJT5ONMYO2NUKYSCDU56ISZANCNFSM5N3IETPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

kgerheiser commented 2 years ago

The build is ongoing on Orion. Will be done shortly.

junwang-noaa commented 2 years ago

What is the intel compiler version for the build on hera and orion?

On Wed, Mar 2, 2022 at 2:11 PM Kyle Gerheiser @.***> wrote:

The build is ongoing on Orion. Will be done shortly.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/379#issuecomment-1057285550, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D6TJ7Q4RRV54U72CYCPTU564PLANCNFSM5N3IETPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

kgerheiser commented 2 years ago

It's 2018.4 on Orion and 18.0.5.274 on Hera

KateFriedman-NOAA commented 2 years ago

The installation based on hera new compiler has been added on /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/stack Please give it a try.

Thanks @Hang-Lei-NOAA ! I'll try it with the global-workflow dev_v16 branch and report back.

WenMeng-NOAA commented 2 years ago

I can't find intel 2018 on Hera.

module use /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/stack
module load hpc/1.2.0
module av
------------ /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core ------------
   cmakemodules/v1.2.0    esma_cmake/v3.4.3         hpc-miniconda3/4.6.14
   ecbuild/ecmwf-3.6.1    hpc-intel/2022.1.2 (L)
kgerheiser commented 2 years ago

@WenMeng-NOAA try module --ignore_cache avail it can take time for the modules to appear in LMod.

KateFriedman-NOAA commented 2 years ago

try module --ignore_cache avail it can take time for the modules to appear in LMod.

I tried that and still don't see the 2018 copies:

-bash-4.2$ module list

Currently Loaded Modules:
  1) hpc/1.2.0 

-bash-4.2$ module --ignore_cache avail

-------------- /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core -----------
   cmakemodules/v1.2.0    ecbuild/ecmwf-3.6.1    esma_cmake/v3.4.3    hpc-intel/2022.1.2    hpc-miniconda3/4.6.14

-------------- /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/stack ----------
   hpc/1.2.0 (L)

If I do a listing under there I only see the 2022.1.2 file:

-bash-4.2$ ll /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/modulefiles/core/hpc-intel/
total 4
-rw-r--r-- 1 Hang.Lei nwprod 760 Mar  2 16:10 2022.1.2.lua

@Hang-Lei-NOAA is the 2018 version still being installed? Thanks!

junwang-noaa commented 2 years ago

@kgerheiser @Hang-Lei-NOAA @KateFriedman-NOAA Just let you know that hera admin was suggesting us moving to the newer Intel compiler as Intel stopped supporting Intel 2019. Here is the message:

"According to this document from Intel, even the Intel 2019 has been off support as of end of 2021:

https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-parallel-studio-xe-supported-and-unsupported-product-versions.html"

kgerheiser commented 2 years ago

Sorry, a bit of a miscommunication there. I installed with 2018.4 on Orion and Hang did 2022.2. If everyone is ok with 2022.1 I will re-build.

KateFriedman-NOAA commented 2 years ago

If everyone is ok with 2022.1 I will re-build.

Keep the 2018 for now @kgerheiser ...need to make sure 2022 will work for the v16 components...

Just let you know that hera admin was suggesting us moving to the newer Intel compiler as Intel stopped supporting Intel 2019.

@junwang-noaa Ok...since I'm not a code developer my next question is...will our current GFSv16.2.0 components work with the newer Intel version? Are any code changes needed to move to this newer version? This is a question for everyone @MichaelLueken-NOAA @WenMeng-NOAA @HelinWei-NOAA @GeorgeGayno-NOAA @YaliMao-NOAA . Thanks!

GeorgeGayno-NOAA commented 2 years ago

@KateFriedman-NOAA What version of netcdf do you want me to use? Currently, (on Hera) I am pointing to a parallel version: https://github.com/ufs-community/UFS_UTILS/blob/3c5a3508cb3d7cda6f728bb20c0bf2d533268381/modulefiles/fv3gfs/global_cycle.hera.lua#L27

Should I use the hpc-stack version?