csiro-coasts / EMS

Environmental Modelling Suite
Other
15 stars 5 forks source link

EMS v1.5.2 SHOC Transport runs fail due to attempting to read a timestamp after the end of the run #31

Closed sharon-tickell closed 4 months ago

sharon-tickell commented 5 months ago

Runtime environment is:

SHOC transport runs proceed smoothly right up until the final timestamp, then fail with an exit code of 1 and a runlog error message like:

[FATAL ]() hd_ts_multifile_eval_sparse: The dump file 'trans.mnc(u1=u1mean)(u2=u2mean)(w=wmean)(Kz=Kzmean)(u1vm=u1vmean)(u2vm=u2vmean)' does not contain the time 398219400.00.

The trans.mnc file was created by running SHOC hydro, and points to only a single transport forcing file that was also created by the SHOC hydro step. Both files are present and accessible, and the SHOC hydro step succeeded with no issues.

The SHOC diagnostics log shows that the run actually succeeded, and contains content like:

Simulation start = 4600.0000 (days) : 2022-08-06 00:00:00
Simulation stop  = 4609.0000 (days) : 2022-08-15 00:00:00
Simulation time  = 4609.0000 (days) : 2022-08-15 00:00:00

CPU time used this iteration = 3.020 (sec)
Mean CPU time used / iteration = 3.350 (sec)
CPU run time ratio = 537.268883
Elapsed time = 0 day(s) 00:25:46
Total time ratio = 502.98 (463:1)
Time to completion = 0 day(s) 00:00:00
Percent complete = 100.0%
CFL Min: 2D=2111.8 3D=2101.6
Run successful. 

The time that the runlog error message references is 398219400.00s => 4609.02083333 days, which is 0.02083333 days = 0.5 hours after the configured simulation stop time. No data with this timestamp is present in the transport-forcing files that were produced by the SHOC hydro stage.

We are currently working around this issue by ignoring the runlog error message and SHOC exit code in favour of the status from the SHOC diagnostics log, but this is not ideal: SHOC should not log an error or return a failure code if the run hasn't already failed.

frizwi commented 4 months ago

Fixed in release 1.5.3