There have been several instances where cylc has said that the `remap-pp-components task has executed successfully even though it didn't. Two instances where this occurred for me are:
These problems were quite difficult to debug becausecylc, job.out, and job.err all seemed to indicate that the remap job had completed successfully when they had just failed silently. There could also be several other points where this loop could fail / skip an iteration and the function would still return 0 and print "Component reamapping complete" without actually executing the copy command.
I think it would be helpful to at least add a check before printing that the remapping completed successfully, so that if the files aren't copied to output_dir the user knows to check this script for issues instead of having cylc fail at some future workflow step that may not be related.
There have been several instances where cylc has said that the `remap-pp-components task has executed successfully even though it didn't. Two instances where this occurred for me are:
When the
pp_chunk_a
variable in my pp yaml did not match thechunksize
that fre inferred from my history files, causing the remap script to simply do nothing at this point in the loop: https://github.com/NOAA-GFDL/fre-workflows/blob/c18dedd918a09fef11a24ffcab692ab8d86a3e7b/app/remap-pp-components/bin/remap-pp-components#L378-L380If the link command fails, the remap script also doesn't do anything when it gets a non-zero return value https://github.com/NOAA-GFDL/fre-workflows/blob/c18dedd918a09fef11a24ffcab692ab8d86a3e7b/app/remap-pp-components/bin/remap-pp-components#L451-L464
These problems were quite difficult to debug because
cylc
,job.out
, andjob.err
all seemed to indicate that the remap job had completed successfully when they had just failed silently. There could also be several other points where this loop could fail / skip an iteration and the function would still return 0 and print "Component reamapping complete" without actually executing the copy command.I think it would be helpful to at least add a check before printing that the remapping completed successfully, so that if the files aren't copied to output_dir the user knows to check this script for issues instead of having cylc fail at some future workflow step that may not be related.