Closed emilyhcliu closed 1 week ago
Tagging @azadeh-gh for awareness.
Thanks for letting me know about this, @emilyhcliu. I will take a look today and see what's going on.
@DavidHuber-NOAA Do you have a timeline for fixing the online archive issue?
We have three experiments running with the latest global workflow, which includes the archive refactoring work merged on June 1. Knowing the timeline for fixing the issue will help us decide whether to wait for the fix or rebuild the global workflow with an earlier version (before June 1) for the experiments.
Thanks!
@emilyhcliu I expect to have a fix in by mid next week at the latest. I did some exploratory work yesterday and have an idea of the root cause, but there's still some more debugging work to do.
What is wrong?
I am running two experiments using two different global-workflow versions: Exp1 - uses hash# 59cdc0ee81926ee8dc7b8e544337bfc85130ad18 (last updated on April 5, 2024) Exp2 - uses hash# acf3aaa2b1d3e3024b0b5d2fe23eee8c317a980b (last updated on June 6, 2024)
For both runs (Exp1 and Exp2), the pgb files created were copied from
RUNDIR
toROTDIR
under theproduct
directory for GDAS and GFS cycles without any problems. The exp1 experiment did not have an online archive problem. However, the online archive job for Exp2 has missing files for both GDAS and GFS runs.The archive job has two parts: one is the HPSS archive, and the other is the online archive. There were no problems with the HPSS archive. However, the online archive job has issues:
The archive job (develop version) is processed using exglobal_archive.py with arcdir.yaml as input.
There was a PR #2621 related to the archive job merged on June 1.
There was a refactoring of the arcdir.yaml.j2, which may be related to the problem with the online archive job reported in this issue.
What should have happened?
For GDAS and GFS cycles, both analysis and forecast pgb files should be archived on disk (online archive) along with gsistat files.
What machines are impacted?
All or N/A
Steps to reproduce
Additional information
My Exp2 run:
HOMEgfs:/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-thompson-enkffix EXPDIR: /scratch1/NCEPDEV/da/Emily.Liu/para/v17/v17allskyens ROTDIR:/scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens ARCDIR:/scratch1/NCEPDEV/da/Emily.Liu/archive/v17allskyens
Related log files: /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens/logs/2023040300/gdasarch.log /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17allskyens/logs/2023040300/gfsarch.log
Do you have a proposed solution?
Debug exglobal_archive.py and its related scripts and yaml files (e.g. arcdir.yaml.j2)