Open guillaumevernieres opened 4 days ago
Two questions:
prepoceanobs
?@AndrewEichmann-NOAA , prepoceananal needs the backgrounds and the cice restart. ~We only need to add a dependency to the cice restart.~ Add a dependency to the forecast.
The reasoning here (worked out in live discussion) is that the ocnanalprep
task dependency prepoceanobs
has as a dependency the f009
forecast, which is produced towards the end of fcst
. The concern was that having ocnanalprep
wait for fcst
to complete instead of the presence of the files would add a few wallclock minutes to the cycle time, but since fcst
should wrap up in the time it takes prepoceanobs
to complete, making fcst
the dependency should be both safe and efficient.
What is wrong?
Wrong job dependency.
What should have happened?
Wait for all the necessary files to be present on disk before starting as opposed to just checking for 1.
What machines are impacted?
All or N/A
Steps to reproduce
A combination of cycling and bad luck.
Additional information
Here's the message from @JessicaMeixner-NOAA :
We had a failure in Jiande's runs: /scratch1/NCEPDEV/climate/Jiande.Wang/working/g-w-cycle/cycle/C03/COMROOT/C03/logs/2021080218/gdasocnanalprep.log.0 The issue is: File "/scratch1/NCEPDEV/da/python/gdasapp/wxflow/20240307/src/wxflow/fsutils.py", line 85, in cp raise OSError(f"unable to copy {source} to {target}") OSError: unable to copy /scratch1/NCEPDEV/climate/Jiande.Wang/working/g-w-cycle/cycle/C03/COMROOT/C03/gdas.20210802/12//model_data/ice/restart/20210802.150000.cice_model.res.nc to /scratch1/NCEPDEV/climate/Jiande.Wang/working/g-w-cycle/cycle/TMP/RUNDIRS/C03/gdasocnanal_18/Data/20210802.150000.cice_model.res.nc
Cathy's detective work, saw that this looks to be a dependency issue (copying Cathy's chat from another thread):
ocnanlprep depends on prepoceanobs. prepoceanobs depends on the existence of previous cycles ocean history f009 file. looking at the ocean history and the ocean restart timestamps, it looks like the ocean history files finish writing faster than the restarts
[Catherine.Thomas@hfe07 history]$ ls -l ../restart/ total 12357700 -rw-r--r-- 1 Jiande.Wang climate 3855258159 Jun 26 01:57 20210802.150000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang climate 3892610186 Jun 26 01:57 20210802.150000.MOM.res_1.nc -rw-r--r-- 1 Jiande.Wang climate 3905057565 Jun 26 01:57 20210802.150000.MOM.res_2.nc -rw-r--r-- 1 Jiande.Wang climate 1001184700 Jun 26 01:47 20210802.150000.MOM.res_3.nc [Catherine.Thomas@hfe07 history]$ ls -l total 9210488 -rw-r--r-- 1 Jiande.Wang climate 2357833989 Jun 26 01:46 gdas.ocean.t12z.inst.f000.nc -rw-r--r-- 1 Jiande.Wang climate 2357833989 Jun 26 01:48 gdas.ocean.t12z.inst.f003.nc -rw-r--r-- 1 Jiande.Wang climate 2357833989 Jun 26 01:50 gdas.ocean.t12z.inst.f006.nc -rw-r--r-- 1 Jiande.Wang climate 2357833989 Jun 26 01:51 gdas.ocean.t12z.inst.f009.nc
Do you have a proposed solution?
No response