NOAA-EMC / GLDAS

0 stars 4 forks source link

Build fails on Hera (and possibly other machines) #36

Open WalterKolczynski-NOAA opened 2 years ago

WalterKolczynski-NOAA commented 2 years ago

Building on Hera (and possibly other machines) is now failing due to a couple of issues. global-workflow has been using v1.15.0, but that version has ceased to work because the ESMF module used in that version was removed (esmf/8_1_0_beta_snapshot_27). See https://github.com/NOAA-EMC/global-workflow/issues/561

I tried to update to the most recent release (also the tip of develop), but that also failed for two reasons:

WalterKolczynski-NOAA commented 2 years ago

Been a week and no acknowledgement. I know things are busy with the WCOSS2 hand-off, but I was hoping this could get fixed quickly. While there is a simple work-around (well, simple if you use an older version), it is also a major blocker to just running out-of-the-box on a major HPC resource.

arunchawla-NOAA commented 2 years ago

@HelinWei-NOAA and @barlage can someone be assigned for this task ? This is now blocking development for global workflow

HelinWei-NOAA commented 2 years ago

Building gldas on hera failed at this command

source ./machine-setup.sh ms_function_name=setuptest_function83289: Command not found. ms_function_name: Undefined variable.

This is used to detect sh vs. bash

Create a test function for sh vs. bash detection. The name is

randomly generated to reduce the chances of name collision.

ms_function_name="setuptest_function$$" eval "$ms_function_name() { /bin/true ; }"

Determine which shell we are using

ms_ksh_test=$( eval 'text="text" ; if [[ $text =~ ^(t).* ]] ; then printf "%s" ${.sh.match[1]} ; fi' 2> /dev/null | cat ) ms_bash_test=$( eval 'if ( set | grep '$__ms_function_name' | grep -v name > /dev/null 2>&1 ) ; then echo t ; fi ' 2> /dev/null | cat )

Any idea how to fix it?

HelinWei-NOAA commented 2 years ago

@kgerheiser @Hang-Lei-NOAA Do you know what changes make this command not working any more on hera?

WalterKolczynski-NOAA commented 2 years ago

I'm able to get everything to build on Hera by doing the following:

In module files

In build scripts/machine-setup

HelinWei-NOAA commented 2 years ago

I'm able to get everything to build on Hera by doing the following:

In module files

  • Changing the esmf module loads to the proper version number (esmf/8_1_1)
  • Correcting the environment variable command syntax (export is a bash command, use setenv; ex: setenv FCOMP mpiifort)
  • Remove the FOPTS that include NETCDF_INC (these are already set in the build script)

In build scripts/machine-setup

  • Reverting all module reset back to module purge
  • Removing the hardcoded FCOMP/FC in the build scripts

We made those changes based on the need for wcoss2 transition like changing module purge to module reset. So we have some conflicts here. Or should we just use "module reset" for hera only. Can you please point me to the version of GLDAS after your modification? I would like to test if it can't be built on wcoss and wcoss2. Thanks.

HelinWei-NOAA commented 2 years ago

This is what Wei Wei from NCO told us to do:

  1. Changed "module purge" to "module reset", and removed "module load envvar/1.0". sorc/gfs_wafs.fd/sorc/build_wafs.sh sorc/gsi.fd/modulefiles/modulefile.ProdGSI.wcoss2.lua sorc/gsi.fd/ush/build_all_cmake.sh
WalterKolczynski-NOAA commented 2 years ago

I knew about the conflict with the module reset change made for WCOSS2, which is why I didn't just put it together in a PR. But it might be best if I just do that anyway and then you make additional changes needed to support all machines. I would suggest testing module reset on all machines and seeing which ones support it.

If you want to see my directory, it is in /scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/build_fix/sorc/gldas.fd. No changes have been committed yet, so just do a git diff.

Let me know if you want me to open that PR.

HelinWei-NOAA commented 2 years ago

I knew about the conflict with the module reset change made for WCOSS2, which is why I didn't just put it together in a PR. But it might be best if I just do that anyway and then you make additional changes needed to support all machines. I would suggest testing module reset on all machines and seeing which ones support it.

If you want to see my directory, it is in /scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/build_fix/sorc/gldas.fd. No changes have been committed yet, so just do a git diff.

Let me know if you want me to open that PR.

Thank you for finding the problem. Module reset is only okay for wcoss2. I have made the change and tested them on wcoss, wcoss2, and hera. The new tag is here

HelinWei-NOAA commented 2 years ago

With the help from Walter, I have fixed the issue. The new tag https://github.com/NOAA-EMC/GLDAS/releases/tag/gldas_gfsv16_release.v1.26.0 was created.

On Wed, Jan 26, 2022 at 4:10 PM arun chawla @.***> wrote:

@HelinWei-NOAA https://github.com/HelinWei-NOAA and @barlage https://github.com/barlage can someone be assigned for this task ? This is now blocking development for global workflow

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1022605160, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHKYE4E2AAPD2P2CW25ULUYBPLXANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

WalterKolczynski-NOAA commented 2 years ago

The module reset in machine-setup.sh wasn't updated, so build is still failing.

HelinWei-NOAA commented 2 years ago

That's weird. The modification is in my fork. But machine-setup.sh hasn't been updated when I merged them to the develop branch. The problem was fixed and the new tag was created.

Hang-Lei-NOAA commented 2 years ago

@HelinWei-NOAA https://github.com/HelinWei-NOAA From nceplibs side, we did not change any other things except removing some snapshots installations of ESMF due to the disk space limitation. The switch to the release version of esmf can solve it.

On Thu, Jan 27, 2022 at 6:37 AM HelinWei-NOAA @.***> wrote:

That's weird. The modification is in my fork. But machine-setup.sh hasn't been updated when I merged them to the develop branch. The problem was fixed and the new tag https://github.com/NOAA-EMC/GLDAS/releases/tag/gldas_gfsv16_release.v.1.27.0was created.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1023118626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFDDMMRDM4GD7XHB6XDUYEU7LANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

WalterKolczynski-NOAA commented 2 years ago

Seems to work now.

WalterKolczynski-NOAA commented 2 years ago

Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.

Hang-Lei-NOAA commented 2 years ago

We did not change the library installations on orion

On Thu, Jan 27, 2022 at 3:06 PM Walter Kolczynski - NOAA < @.***> wrote:

Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1023594935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFB56YZEHFB6N2OWSX3UYGQUZANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

HelinWei-NOAA commented 2 years ago

Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.

fixed and created a new tag

WalterKolczynski-NOAA commented 2 years ago

Build now confirmed on Orion, Hera, and WCOSS-Dell

DavidHuber-NOAA commented 2 years ago

Builds were also successful on S4 and Jet.