PROBLEM: When restarting the end of an interrupted run, there would be sometimes be negative precipitation between consecutive timesteps. This occurred for two reasons:
Output files: if an existing NetCDF output file was from an interrupted run, negative precip would happen during a restarted run to one of these existing files during it's next open and modify.
Aggregated files: the aggregated script will only write if there isn't an aggregated file at a date. During an interrupted run this could lead to partial results, e.g. an aggregated file only having 18 hours of a day. Also, during testing it was found that negative precip occurs if the partial aggregated file or the previous date's aggregated file exist. So the presence of either of the last two aggregated files would cause negative precip.
See NOTES section for further discussion of the previous two points.
FIX:
Output files: if a file exists and it is hour 0 of the day, clobber the file to completely rewrite it, rather than trying to open and modify.
Aggregated files: removes that last two aggregated files, if they exist, before recreating them and any new ones within an output directory.
TESTS CONDUCTED: Lots of tests restarting from interrupted files and/or different start dates and then comparing results.
NOTES: Unique things about the negative precipitation error:
Negative precipitation in most cases would occur 24 hours after the restart, sometimes it would occur sooner.
Shifted restart dates led to differing results, e.g. comparing output from overlapping timestamps from day x restarts to day x+1 restarts. It is important and relieving to note that restarts from the same date will lead to identical results. The reason for the differing shifted restart results is uncertain at this point.
There also seems to be a 24 hour component to this difference. In the current tests, the day x restart and day x+1 restart were restarted from files produced in the same origin run. Day x was the same for the first 24 hours as the origin run, but then starts to differ at day x+1 from the x+1 restart run. Will do a restart at day x-1 to compare between all.
Negative precip occurred only once from interrupted files, rerunning the same case would fix the issue. This indicates that the outputted NetCDF files caused that issue, hence clobbering them provided a fix.
It is possible removing all the aggregated files from the restart point on will need to be done. Looking at some existing tests seem to point to this not being needed though.
Checklist
[ ] Closes issue #xxxx (An issue must exist or be created to be closed. The
issue describes and documents the problem and general solution, the PR
describes the technical details of the solution.)
TYPE: bugfix
KEYWORDS: restart, post-processing, negative precipitation
SOURCE: Soren Rasmussen, NSF NCAR
DESCRIPTION OF CHANGES:
PROBLEM: When restarting the end of an interrupted run, there would be sometimes be negative precipitation between consecutive timesteps. This occurred for two reasons:
FIX:
TESTS CONDUCTED: Lots of tests restarting from interrupted files and/or different start dates and then comparing results.
NOTES: Unique things about the negative precipitation error:
Negative precipitation in most cases would occur 24 hours after the restart, sometimes it would occur sooner.
Shifted restart dates led to differing results, e.g. comparing output from overlapping timestamps from day
x
restarts to dayx+1
restarts. It is important and relieving to note that restarts from the same date will lead to identical results. The reason for the differing shifted restart results is uncertain at this point.x
restart and dayx+1
restart were restarted from files produced in the same origin run. Dayx
was the same for the first 24 hours as the origin run, but then starts to differ at dayx+1
from thex+1
restart run. Will do a restart at dayx-1
to compare between all.Negative precip occurred only once from interrupted files, rerunning the same case would fix the issue. This indicates that the outputted NetCDF files caused that issue, hence clobbering them provided a fix.
It is possible removing all the aggregated files from the restart point on will need to be done. Looking at some existing tests seem to point to this not being needed though.
Checklist