NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
74 stars 164 forks source link

Stop using stand-alone UPP #2437

Closed WalterKolczynski-NOAA closed 2 months ago

WalterKolczynski-NOAA commented 5 months ago

What new functionality do you need?

As part of the Rocky 8 upgrade for Hera (PR #2421), we had to move to a stand-alone UPP version because the one in UFS has not yet been updated. Once the UPP version in UFS is updated to include the Rocky 8 updates, we should move back to using that version instead of checking out a separate version.

What are the requirements for the new functionality?

No separate UPP submodule

Acceptance Criteria

Dependency: ufs-community/ufs-weather-model#2213

Suggest a solution (optional)

No response

JessicaMeixner-NOAA commented 5 months ago

FYI @WenMeng-NOAA

WenMeng-NOAA commented 5 months ago

@JessicaMeixner-NOAA @WalterKolczynski-NOAA I have been preparing my UFS PR for updating upp submodule.

RussTreadon-NOAA commented 5 months ago

@WalterKolczynski-NOAA and @WenMeng-NOAA: I assume from this issue that ~HOMEgfs/sorc/upp.fd is the stand-alone UPP. Execution of sorc/build_all.sh in a working copy of develop at d6be3b5c on Hera reports a upp build failure

Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None
Building gsi_enkf, ufs, gfs_utils, gdas, ww3prepost, ufs_utils, gsi_utils, gsi_monitor, upp
Starting build_gsi_enkf.sh
Starting build_ufs.sh
Starting build_gfs_utils.sh
Starting build_gdas.sh
Starting build_ww3prepost.sh
Starting build_ufs_utils.sh
Starting build_gsi_utils.sh
Starting build_gsi_monitor.sh
Starting build_upp.sh
build_gsi_enkf.sh completed successfully!
build_gfs_utils.sh completed successfully!
build_ufs_utils.sh completed successfully!
build_gsi_utils.sh completed successfully!
build_gsi_monitor.sh completed successfully!
build_ww3prepost.sh completed successfully!
build_upp.sh failed with status 2!
build_ufs.sh completed successfully!
build_gdas.sh completed successfully!
BUILD ERROR: One or more components failed to build
  Check the associated build log(s) for details.

A check of sorc/logs/build_upp.log shows

[ 88%] Building Fortran object sorc/ncep_post.fd/CMakeFiles/upp.dir/OTLIFT.f.o
[ 89%] Building Fortran object sorc/ncep_post.fd/CMakeFiles/upp.dir/SURFCE.f.o
[ 90%] Linking Fortran static library libupp.a
/usr/bin/ar: Relink `/apps/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/libimf.so' with `/lib64/libm.so.6' for IFUNC symbol `sinf'
Error running link command: Segmentation fault
make[2]: *** [sorc/ncep_post.fd/CMakeFiles/upp.dir/build.make:2182: sorc/ncep_post.fd/libupp.a] Error 1
make[2]: *** Deleting file 'sorc/ncep_post.fd/libupp.a'
make[1]: *** [CMakeFiles/Makefile2:133: sorc/ncep_post.fd/CMakeFiles/upp.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

CI testing using C96C48_hybatmDA and C96C48_ufs_hybatmDA encounter failed jobs for gdasatmanlupp and gfsatmanlupp because $HOMEgfs/exec/upp.x does not exist. This is a soft link pointing at HOMEgfs/sorc/upp.fd/exec/upp.x.

It this failure expected?

WalterKolczynski-NOAA commented 5 months ago

The failure is not expected from a fresh clone. If you tried to pull in develop to an existing clone, you should've gotten a warning about it couldn't overwrite the upp.fd symlink. If that is the case, delete the symlink and then pull again (recursively or run submodule update afterwards).

RussTreadon-NOAA commented 5 months ago

Manually remove sorc/upp.fd followed by git submodule sync and git submodule update. Then manually execute ./build_upp.sh in $HOMEgfs/sorc. This worked. upp.x created. Rerun of gdasatmanlupp and gfsatmanlupp was successful.

WenMeng-NOAA commented 5 months ago

@WalterKolczynski-NOAA @aerorahul The ufs-weather-model PR #2213 was submitted for updating upp submodule.

WalterKolczynski-NOAA commented 5 months ago

@WenMeng-NOAA thanks for keeping us updated. First time we updated UFS after that is merged we can remove the temporary submodule.

guoqing-noaa commented 5 months ago

Manually remove sorc/upp.fd followed by git submodule sync and git submodule update. Then manually execute ./build_upp.sh in $HOMEgfs/sorc. This worked. upp.x created. Rerun of gdasatmanlupp and gfsatmanlupp was successful.

I got the same error on Hera (Rocky8) and the manual method did not work for me.

[ 89%] Building Fortran object sorc/ncep_post.fd/CMakeFiles/upp.dir/SURFCE.f.o     
[ 90%] Linking Fortran static library libupp.a
/usr/bin/ar: Relink `/apps/oneapi/compiler/2022.0.2/linux/compiler/lib/intel64_lin/libimf.so' with `/lib64/libm.so.6' for IFUNC symbol `sinf'
Error running link command: Segmentation fault
make[2]: *** [sorc/ncep_post.fd/CMakeFiles/upp.dir/build.make:2182: sorc/ncep_post.fd/libupp.a] Error 1
make[2]: *** Deleting file 'sorc/ncep_post.fd/libupp.a'
make[1]: *** [CMakeFiles/Makefile2:133: sorc/ncep_post.fd/CMakeFiles/upp.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I started from a clean recursive clone. I tried twice but got the same error.

Here are the steps I repeat the error:

git clone --recursive https://github.com/NOAA-EMC/global-workflow
cd global-workflow/sorc
./build_upp.sh

Could this be related to any of my environment settings?

WalterKolczynski-NOAA commented 5 months ago

@guoqing-noaa we found there is actually an issue with the UPP hash. We added the fix into #2442, which should be merged soon.

WenMeng-NOAA commented 4 months ago

@WalterKolczynski-NOAA My UFS PR #2213 was merged today. You may update the global-workflow accordingly to solve this issue.