Closed penguian closed 6 years ago
rb4844 _uploaded file am528_running_3.08pm_270717.png
(2331.3 KiB)_
mppncombine fail 1
rb4844 uploaded file mppncombine fail 2.png
(2351.8 KiB)
mppncombine fail 2
rb4844 commented
How do I delete tiff attachments now?
@scott.wales@bom.gov.au commented
Might be because I'm an admin, but I see a 'delete attachment' button when I follow the attachment link
@martin.dix@anu.edu.au commented
Anrold reports the same problem in #327.
The dependency graph is
[[[ [RESUB] ]]]
graph = """
filemove[-[RESUB]] => coupled => filemove => mppncombine [ '=> housekeep' if HOUSEKEEP else '' ]
"""
so the model depends only on filemove from the previous run. A new run can start if mppcombine isn't complete.
The problem is that the filemove script doesn't add a date stamp to the filenames so that files from a previous run can be overwritten.
Could fix by changing the dependency to
mppcombine[-[RESUB]] => coupled => filemove => mppncombine
but more efficient to add dates to filenames so that the model doesn't have to wait unnecessarily.
This is implemented in modified versions of mppcombine.sh and filemove_access.sh in u-ao219. See
https://code.metoffice.gov.uk/trac/roses-u/changeset/48244/a/o/2/1/9
@martin.dix@anu.edu.au changed _comment0 which not transferred by tractive
@martin.dix@anu.edu.au changed status from new
to assigned
@martin.dix@anu.edu.au set owner to mrd599
@martin.dix@anu.edu.au commented
This fix can be applied to a running suite by copying the new filemove_access.sh and mppcombine.sh to the cylc-run/SUITE/bin directory on raijin. Note that you can only do this after filemove and mppcombine have both run because it changes the intermediate filenames.
@martin.dix@anu.edu.au changed status from assigned
to closed
@martin.dix@anu.edu.au set resolution to fixed
@martin.dix@anu.edu.au changed title from mppncombine failed
to New filemove can overwrite output before mppncombine runs
keyword_mppncombine
resolution_fixed
| by rb4844I have had a couple of runs where mppncombine has failed. Appears to be a consequence of long queue times such that when it finally runs the ocean files have gone from the temporary HistoryData folder and, I assume, been transferred across to short.
The consequence of this is then housekeep gets stuck waiting, two more NCI coupled steps in the rose tree are spawned but none after that unless manually intervene to delete the failed steps.
See screen captures.
Eg job.err:
mv: target `ocean_scalar.nc-00010630' is not a directory Received signal EXIT
Issue migrated from trac:328 at 2024-01-31 18:30:20 +1100