NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Update ufs-weather-model #2646

Closed JessicaMeixner-NOAA closed 4 weeks ago

JessicaMeixner-NOAA commented 1 month ago

Description

Updates UFS model to the commit from today. A commit is coming soon that will update RDHPCS to 1.6.1, but i thought I'd open this now. This should resolve the issue and allow C768 runs on Hera and allow for CICE to run on WCOSS2 (due to library updates to allow linking).

From what I can tell, all updates needed were done by @HenryWinterbottom-NOAA which were updates for CICE

Fixes #2490

Type of change

Change characteristics

How has this been tested?

In Progress...

Checklist

emcbot commented 1 month ago

CI Update on Wcoss2 at 05/31/24 09:04:12 PM
============================================
Cloning and Building global-workflow PR: 2646
with PID: 221491 on host: clogin01
emcbot commented 1 month ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Fri May 31 21:12:14 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/31/24 09:24:39 PM
Case setup: Completed for experiment C48_ATM_1f1e5ab2
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_1f1e5ab2
Case setup: Skipped for experiment C48_S2SWA_gefs_1f1e5ab2
Case setup: Completed for experiment C48_S2SW_1f1e5ab2
Case setup: Completed for experiment C96_atm3DVar_extended_1f1e5ab2
Case setup: Skipped for experiment C96_atm3DVar_1f1e5ab2
Case setup: Skipped for experiment C96_atmaerosnowDA_1f1e5ab2
Case setup: Completed for experiment C96C48_hybatmDA_1f1e5ab2
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_1f1e5ab2
emcbot commented 1 month ago

Experiment C48_ATM_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 10:36:37 PM

emcbot commented 1 month ago

Experiment C48_S2SW_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 10:57:13 PM

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Hercules with error logs:

/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/COMROOT/C96C48_hybatmDA_1f1e5ab2/logs/2021122100/gfsatmos_prod_f072-f078.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 month ago

Experiment C96C48_hybatmDA FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/C96C48_hybatmDA_1f1e5ab2

emcbot commented 1 month ago

Experiment C96C48_hybatmDA_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 11:51:21 PM

emcbot commented 1 month ago

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2646

emcbot commented 1 month ago

CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2646

emcbot commented 1 month ago

Experiment C96_atm3DVar_extended_1f1e5ab2 SUCCESS on Wcoss2 at 06/01/24 06:03:38 AM

emcbot commented 1 month ago

All CI Test Cases Passed on Wcoss2:


Experiment C48_ATM_1f1e5ab2 *** SUCCESS *** at 05/31/24 10:36:37 PM
Experiment C48_S2SW_1f1e5ab2 *** SUCCESS *** at 05/31/24 10:57:13 PM
Experiment C96C48_hybatmDA_1f1e5ab2 *** SUCCESS *** at 05/31/24 11:51:21 PM
Experiment C96_atm3DVar_extended_1f1e5ab2 *** SUCCESS *** at 06/01/24 06:03:38 AM
emcbot commented 1 month ago

Experiment C48_S2SW FAILED on Hercules with error logs:

/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/COMROOT/C48_S2SW_1f1e5ab2/logs/2021032312/gfswavepostpnt.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 month ago

Experiment C48_S2SW FAILED on Hercules in /work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/C48_S2SW_1f1e5ab2

WalterKolczynski-NOAA commented 1 month ago

Wave post points is timing out. Is there a change in the model that might be making that run more slowly? Otherwise, @DavidHuber-NOAA recently had a PR (#2588) that set a cap on the number of PEs/node at 40 to solve Hercules IO issues with the job; perhaps that should be lower? Or maybe the PEs aren't being spread to use all nodes?

JessicaMeixner-NOAA commented 4 weeks ago

Wave post points is timing out. Is there a change in the model that might be making that run more slowly? Otherwise, @DavidHuber-NOAA recently had a PR (#2588) that set a cap on the number of PEs/node at 40 to solve Hercules IO issues with the job; perhaps that should be lower? Or maybe the PEs aren't being spread to use all nodes?

@WalterKolczynski-NOAA the code for the wave point post job did not change, so my guess is we either got slow nodes (not sure if this is a hercules and orion issue or just an orion issue as I haven't used hercules enough) or one of the other issues you are describing. I do know there will soon be a module udpate to move to 1.6 spack on hercules (and other RDHPCS) not sure if that will help or hurt this issue.

DavidHuber-NOAA commented 4 weeks ago

@WalterKolczynski-NOAA @JessicaMeixner-NOAA I have made a few optimizations to the wavepostpnt jobs this morning and will try them out today. I'll let you know if there is an improvement to speed. You can check them out here.

JessicaMeixner-NOAA commented 4 weeks ago

@DavidHuber-NOAA let me know if there's anything I can do to help with the point output job debugging on hercules issues.

aerorahul commented 4 weeks ago

@JessicaMeixner-NOAA If you can move the yamls from ci/cases/weekly/ into ci/cases/hires/, we would like to merge this PR. A patch for Hercules failure will be addressed in a separate PR as the failure is not related to the changes in this PR. Thanks!

JessicaMeixner-NOAA commented 4 weeks ago

@aerorahul - can you double check what i did? I think I did as asked.

aerorahul commented 4 weeks ago

@aerorahul - can you double check what i did? I think I did as asked.

We would like to keep:

C384_S2SWA.yaml
C384_atm3DVar.yaml
C384C192_hybatmda.yaml

in the ci/cases/weekly/ directory.
Please move them back from ci/cases/hires/ to ci/cases/weekly/.

I was referring to:

C1152_S2SW.yaml
C768_S2SW.yaml

to be moved to ci/cases/hires. They have been removed in the last commit https://github.com/NOAA-EMC/global-workflow/pull/2646/commits/77b4ff03408fd4b3a00c5f9cae14bc397d33888e

JessicaMeixner-NOAA commented 4 weeks ago

Thanks for clarifying what you wanted, I hadn't realized I had moved any yamls in this PR so I was a bit confused. Now I think this is what you wanted to see. Please let me know if I am not correct.