Open WalterKolczynski-NOAA opened 2 years ago
Been a week and no acknowledgement. I know things are busy with the WCOSS2 hand-off, but I was hoping this could get fixed quickly. While there is a simple work-around (well, simple if you use an older version), it is also a major blocker to just running out-of-the-box on a major HPC resource.
@HelinWei-NOAA and @barlage can someone be assigned for this task ? This is now blocking development for global workflow
Building gldas on hera failed at this command
source ./machine-setup.sh ms_function_name=setuptest_function83289: Command not found. ms_function_name: Undefined variable.
This is used to detect sh vs. bash
ms_function_name="setuptest_function$$" eval "$ms_function_name() { /bin/true ; }"
ms_ksh_test=$( eval 'text="text" ; if [[ $text =~ ^(t).* ]] ; then printf "%s" ${.sh.match[1]} ; fi' 2> /dev/null | cat ) ms_bash_test=$( eval 'if ( set | grep '$__ms_function_name' | grep -v name > /dev/null 2>&1 ) ; then echo t ; fi ' 2> /dev/null | cat )
Any idea how to fix it?
@kgerheiser @Hang-Lei-NOAA Do you know what changes make this command not working any more on hera?
I'm able to get everything to build on Hera by doing the following:
In module files
esmf/8_1_1
)export
is a bash command, use setenv
; ex: setenv FCOMP mpiifort
)In build scripts/machine-setup
module reset
back to module purge
I'm able to get everything to build on Hera by doing the following:
In module files
- Changing the esmf module loads to the proper version number (
esmf/8_1_1
)- Correcting the environment variable command syntax (
export
is a bash command, usesetenv
; ex:setenv FCOMP mpiifort
)- Remove the FOPTS that include NETCDF_INC (these are already set in the build script)
In build scripts/machine-setup
- Reverting all
module reset
back tomodule purge
- Removing the hardcoded FCOMP/FC in the build scripts
We made those changes based on the need for wcoss2 transition like changing module purge to module reset. So we have some conflicts here. Or should we just use "module reset" for hera only. Can you please point me to the version of GLDAS after your modification? I would like to test if it can't be built on wcoss and wcoss2. Thanks.
This is what Wei Wei from NCO told us to do:
I knew about the conflict with the module reset change made for WCOSS2, which is why I didn't just put it together in a PR. But it might be best if I just do that anyway and then you make additional changes needed to support all machines. I would suggest testing module reset on all machines and seeing which ones support it.
If you want to see my directory, it is in /scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/build_fix/sorc/gldas.fd
. No changes have been committed yet, so just do a git diff.
Let me know if you want me to open that PR.
I knew about the conflict with the module reset change made for WCOSS2, which is why I didn't just put it together in a PR. But it might be best if I just do that anyway and then you make additional changes needed to support all machines. I would suggest testing module reset on all machines and seeing which ones support it.
If you want to see my directory, it is in
/scratch2/NCEPDEV/ensemble/save/Walter.Kolczynski/global-workflow/build_fix/sorc/gldas.fd
. No changes have been committed yet, so just do a git diff.Let me know if you want me to open that PR.
Thank you for finding the problem. Module reset is only okay for wcoss2. I have made the change and tested them on wcoss, wcoss2, and hera. The new tag is here
With the help from Walter, I have fixed the issue. The new tag https://github.com/NOAA-EMC/GLDAS/releases/tag/gldas_gfsv16_release.v1.26.0 was created.
On Wed, Jan 26, 2022 at 4:10 PM arun chawla @.***> wrote:
@HelinWei-NOAA https://github.com/HelinWei-NOAA and @barlage https://github.com/barlage can someone be assigned for this task ? This is now blocking development for global workflow
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1022605160, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPHKYE4E2AAPD2P2CW25ULUYBPLXANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
The module reset in machine-setup.sh wasn't updated, so build is still failing.
That's weird. The modification is in my fork. But machine-setup.sh hasn't been updated when I merged them to the develop branch. The problem was fixed and the new tag was created.
@HelinWei-NOAA https://github.com/HelinWei-NOAA From nceplibs side, we did not change any other things except removing some snapshots installations of ESMF due to the disk space limitation. The switch to the release version of esmf can solve it.
On Thu, Jan 27, 2022 at 6:37 AM HelinWei-NOAA @.***> wrote:
That's weird. The modification is in my fork. But machine-setup.sh hasn't been updated when I merged them to the develop branch. The problem was fixed and the new tag https://github.com/NOAA-EMC/GLDAS/releases/tag/gldas_gfsv16_release.v.1.27.0was created.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1023118626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFDDMMRDM4GD7XHB6XDUYEU7LANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Seems to work now.
Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.
We did not change the library installations on orion
On Thu, Jan 27, 2022 at 3:06 PM Walter Kolczynski - NOAA < @.***> wrote:
Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GLDAS/issues/36#issuecomment-1023594935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFB56YZEHFB6N2OWSX3UYGQUZANCNFSM5L2OC5PQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Nope, still broken on Orion. Looks like the esmf version wasn't updated to the correct format there.
fixed and created a new tag
Build now confirmed on Orion, Hera, and WCOSS-Dell
Builds were also successful on S4 and Jet.
Building on Hera (and possibly other machines) is now failing due to a couple of issues. global-workflow has been using v1.15.0, but that version has ceased to work because the ESMF module used in that version was removed (esmf/8_1_0_beta_snapshot_27). See https://github.com/NOAA-EMC/global-workflow/issues/561
I tried to update to the most recent release (also the tip of develop), but that also failed for two reasons: