NOAA-OWP / ngen

Next Generation Water Modeling Engine and Framework Prototype
Other
84 stars 63 forks source link

Relative run time support #661

Open ZacharyWills opened 1 year ago

ZacharyWills commented 1 year ago

Forecast and hindcast strategies are often relative to the current time, for example a 12hr forecast forwards from the current hour.

Current behavior

The time block currently takes a date time for both he start and end:

    "time": {
        "start_time": "2015-12-01 00:00:00",
        "end_time": "2015-12-30 23:00:00",
        "output_interval": 3600
    }

The user then must calculate the proper end time based on their strategy, and for long running systems this would require a separate script to update the "end_time" each cycle.

Expected behavior

Ideally a relative end_time would give the option for something like:

    "time": {
        "start_time": "2015-12-01 00:00:00",
        "end_time": "+12hrs",
        "output_interval": 3600
    }

This would mean that the start_time can just be the existing date as taken from the underlying host, and the same realization time could be reused for a recurring run either forwards or backwards in time over the available forcings.

Current a script is needed to update both the start_time and end_time to move the window that's being modeled forward or backward in time at each interval. This aims to eliminate that extra software, and add ease-of-use to the framework.

As always I'm open to suggestions to make this better! Thanks, Zach

jameshalgren commented 1 year ago

Some comments. All thoughts are my own and corrections or alternate views are welcome.

Way back, we had a discussion about this with the routing. We were looking at the fact that the csv inputs that we allowed at that time did not necessarily have dates in the headers -- we could simply have the same number of inputs as basins and time steps and apply them to any given initial condition for interpretation according to the scenario: https://github.com/NOAA-OWP/t-route/issues/427#issuecomment-919302300

The restart from CSV capability permits but does not require date-centric headers for the CSV columns. The two places that the t0 is used directly are 1: for the Reservoir simulation; 2: for dealing with the restart file write out.

We could add a parameter to the yaml file to allow the user to directly specify a t0. That might be used to provide a value to accompany a csv file lacking the information otherwise. It might also, either intentionally or accidentally, override the value in the file.

A couple of places where this feature might help:

Operational simulation

As mentioned by the O.P., in an operational situation, the use case calls for running exactly the same process many times but with files that are constantly updating underneath the operation. There should be some date-aware archiving, but the execution could be facilitated by being able to simply point to a set of files and say, "run 100 of these, whatever they are."

Error checking

If we assume that time stamp of a particular restart file is accurate relative to the times of the forcings, we can use these files with confidence. In reality, it is up to the scientist/user/data producer to code these things correctly and the code is really completely ignorant of incorrectly coded/date-stamped files. Time zone or daylight savings time mistakes are common. Also, accumulated values may have different assumptions (for instance, with an hourly precipitation value, is it the previous hour's total precipitation, the upcoming hour's, or a value that represents an average for the period centered on the labelled time?). Also, in the coding of a new module, it is easy to make mistakes in time step management (related to the same questions of centering in addition to things like improper header handling, etc.) To handle these issues, one of the first things to check is whether inputs are shifted by an hour one way or the other, and having the ability to force a relative date can be super handy for this.

Ensembles and Scenarios

Right now, ngen requires explicit date stamping of everything ,but it should be possible to over-ride that for producing hypothetical ensembles or scenario testing. If the user specifically wants to over-ride the system to force a summer hydrology simulation to start with winter initial conditions (to consider a changed climate scenario, for instance) or to run a series of perturbed inputs generated by a date-agnostic stochastic process, this would be easier with this relative date capability.