Closed JessicaMeixner-NOAA closed 4 weeks ago
CI Update on Wcoss2 at 05/31/24 09:04:12 PM
============================================
Cloning and Building global-workflow PR: 2646
with PID: 221491 on host: clogin01
Automated global-workflow Testing Results:
Machine: Wcoss2
Start: Fri May 31 21:12:14 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/31/24 09:24:39 PM
Case setup: Completed for experiment C48_ATM_1f1e5ab2
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_1f1e5ab2
Case setup: Skipped for experiment C48_S2SWA_gefs_1f1e5ab2
Case setup: Completed for experiment C48_S2SW_1f1e5ab2
Case setup: Completed for experiment C96_atm3DVar_extended_1f1e5ab2
Case setup: Skipped for experiment C96_atm3DVar_1f1e5ab2
Case setup: Skipped for experiment C96_atmaerosnowDA_1f1e5ab2
Case setup: Completed for experiment C96C48_hybatmDA_1f1e5ab2
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_1f1e5ab2
Experiment C48_ATM_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 10:36:37 PM
Experiment C48_S2SW_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 10:57:13 PM
Experiment C96C48_hybatmDA FAILED on Hercules with error logs:
/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/COMROOT/C96C48_hybatmDA_1f1e5ab2/logs/2021122100/gfsatmos_prod_f072-f078.log
Follow link here to view the contents of the above file(s): (link)
Experiment C96C48_hybatmDA FAILED on Hercules in
/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/C96C48_hybatmDA_1f1e5ab2
Experiment C96C48_hybatmDA_1f1e5ab2 SUCCESS on Wcoss2 at 05/31/24 11:51:21 PM
CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2646
CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2646
Experiment C96_atm3DVar_extended_1f1e5ab2 SUCCESS on Wcoss2 at 06/01/24 06:03:38 AM
All CI Test Cases Passed on Wcoss2:
Experiment C48_ATM_1f1e5ab2 *** SUCCESS *** at 05/31/24 10:36:37 PM
Experiment C48_S2SW_1f1e5ab2 *** SUCCESS *** at 05/31/24 10:57:13 PM
Experiment C96C48_hybatmDA_1f1e5ab2 *** SUCCESS *** at 05/31/24 11:51:21 PM
Experiment C96_atm3DVar_extended_1f1e5ab2 *** SUCCESS *** at 06/01/24 06:03:38 AM
Experiment C48_S2SW FAILED on Hercules with error logs:
/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/COMROOT/C48_S2SW_1f1e5ab2/logs/2021032312/gfswavepostpnt.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48_S2SW FAILED on Hercules in
/work2/noaa/stmp/CI/HERCULES/2646/RUNTESTS/C48_S2SW_1f1e5ab2
Wave post points is timing out. Is there a change in the model that might be making that run more slowly? Otherwise, @DavidHuber-NOAA recently had a PR (#2588) that set a cap on the number of PEs/node at 40 to solve Hercules IO issues with the job; perhaps that should be lower? Or maybe the PEs aren't being spread to use all nodes?
Wave post points is timing out. Is there a change in the model that might be making that run more slowly? Otherwise, @DavidHuber-NOAA recently had a PR (#2588) that set a cap on the number of PEs/node at 40 to solve Hercules IO issues with the job; perhaps that should be lower? Or maybe the PEs aren't being spread to use all nodes?
@WalterKolczynski-NOAA the code for the wave point post job did not change, so my guess is we either got slow nodes (not sure if this is a hercules and orion issue or just an orion issue as I haven't used hercules enough) or one of the other issues you are describing. I do know there will soon be a module udpate to move to 1.6 spack on hercules (and other RDHPCS) not sure if that will help or hurt this issue.
@WalterKolczynski-NOAA @JessicaMeixner-NOAA I have made a few optimizations to the wavepostpnt jobs this morning and will try them out today. I'll let you know if there is an improvement to speed. You can check them out here.
@DavidHuber-NOAA let me know if there's anything I can do to help with the point output job debugging on hercules issues.
@JessicaMeixner-NOAA
If you can move the yamls from ci/cases/weekly/
into ci/cases/hires/
, we would like to merge this PR.
A patch for Hercules failure will be addressed in a separate PR as the failure is not related to the changes in this PR.
Thanks!
@aerorahul - can you double check what i did? I think I did as asked.
@aerorahul - can you double check what i did? I think I did as asked.
We would like to keep:
C384_S2SWA.yaml
C384_atm3DVar.yaml
C384C192_hybatmda.yaml
in the ci/cases/weekly/
directory.
Please move them back from ci/cases/hires/
to ci/cases/weekly/
.
I was referring to:
C1152_S2SW.yaml
C768_S2SW.yaml
to be moved to ci/cases/hires
.
They have been removed in the last commit https://github.com/NOAA-EMC/global-workflow/pull/2646/commits/77b4ff03408fd4b3a00c5f9cae14bc397d33888e
Thanks for clarifying what you wanted, I hadn't realized I had moved any yamls in this PR so I was a bit confused. Now I think this is what you wanted to see. Please let me know if I am not correct.
Description
Updates UFS model to the commit from today. A commit is coming soon that will update RDHPCS to 1.6.1, but i thought I'd open this now. This should resolve the issue and allow C768 runs on Hera and allow for CICE to run on WCOSS2 (due to library updates to allow linking).
From what I can tell, all updates needed were done by @HenryWinterbottom-NOAA which were updates for CICE
Fixes #2490
Type of change
Change characteristics
How has this been tested?
In Progress...
Checklist