Closed kasra-keshavarz closed 6 months ago
I do not seem to suffer from such a problem. I checked both the Yukon vector and grid-based setups and the MRB grid-based setup.
I think the issue may have to do with the start time of the simulation being at hour 7:00 instead of 0.00. I suspect you are using RDRS forcing which starts at 13:00 and with the time zone shift it became 7:00.
I remember seeing this confusion before in some of MESH outputs. Partial days cause issues when calculating daily output and Dan did fix a related issue for maybe streamflow (before moving to Github).
What does the acronym PR
mean?
I think I can explain what's happening. When you have the start time at 7:00, the day is 0, when it is 0:00 on the next day, not a full day has passed so the day is still 0, then the counter flips with the 3rd day. This will happen for daily output. It could also occur for monthly or annual output if we start in the middle of the month/year. I think (or hope) it still calculates things correctly. The daily mean will be based on the number of hours of simulation done for that day - even if less than 24, same for partial month/year (and partial month/year at the end of simulation). This may need verification though.
The fix could be to set the initial time of the time axis to 00:00 irrespective of the simulation start time, similarly for the month/year for monthly/annual output (use the 1st day of the month irrespective of the day we start and similar for the year). We also have seasonal output so a similar fix would be needed - it is annual but the start of the year is not Jan; it is taken as the first months of simulation - useful for hydrological years but can be handled via post-processing of monthly output.
For partial days/months/years at the end of the simulation, I implemented a fix to force MESH to write the output. Sometimes, it creates an additional empty field (e.g. when the simulation end time is 2000 01 00 which we use to ensure 1999 is run to the end, this adds zero values for 2000 in an annual file. The year counter flips but there are no results.
This only can cause issues when the output of MESH
is going to be read with other software, and double 0
values can potentially hinder usage.
I feel playing with the time:units
can minimize the headache; either using units
such as hours
, days
, or years
depending on the time-scale of the output file or, switching altogether to seconds
and following UNIX EPOCH standard. Just an idea.
P.S. pr
is for pull request
.
What I have seen is: hourly output gets units in 'hours since simulation start hour', daily & monthly get units in 'days since sim start day', and annual gets units in 'years since sim start year'. The time variable is int in all cases. So what you suggest is already implemented.
What's needed is a special treatment for partial days, months, or years. It can be automated via rounding down (truncating the partial time step) and then adding 1 to the time. Or truncating down when getting the start time to use for the units - this will make it always start at the beginning of the day/month/year. It won't harm if the counter starts at 7. I think the latter is what I hinted at in my previous comment.
7 hours since mid-night is the same as 0 hours since 7:00am
UNIX EPOCH would eliminate the need for special treatments. The time-steps are expressed in seconds, but the actual values of time could be a representation of hours, days, years, etc.
but you may need to use float for time instead of integer. A 100-year simulation will go beyond 3,153,600,000 which is out of the range for the default 4-byte integer.
and with floats, you may get inaccuracies for the time axis - a few seconds or minutes (if accumulated) can push a date to a different month or year - maybe not a problem but could be.
alternatively we could use 8-byte integers.
So, here is a test of how to create datetime values in UNIX EPOCH in a sample netCDF file:
import xarray as xr
import numpy as np
# Create a CFTimeIndex with time stamps going hourly from 1950-01-01T12:00:00 to 2100-01-01T12:00:00
time_range = xr.cftime_range(start='1950-01-01T12:00:00', end='2100-01-01T12:00:00', freq='H', calendar='gregorian')
# Create a dataset with a single variable 'pr' and time dimension
data = np.ones(len(time_range)) # Set all values to 1 for the 'pr' variable
ds = xr.Dataset({'pr': (['time'], data)}, coords={'time': time_range})
# Print the created dataset
print(ds)
<xarray.Dataset>
Dimensions: (time: 1314889)
Coordinates:
* time (time) object 1950-01-01 12:00:00 ... 2100-01-01 12:00:00
Data variables:
pr (time) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0
ds.to_netcdf('./test.nc', encoding={"time": {"units": "seconds since 1970-01-01 00:00:00"}})
And, reading the values using ncdump
:
$ ncdump -h test.nc
netcdf test {
dimensions:
time = 1314889 ;
variables:
double pr(time) ;
pr:_FillValue = NaN ;
int64 time(time) ;
time:units = "seconds since 1970-01-01" ;
time:calendar = "standard" ;
}
$ ncdump -v time test.nc | tail
4102300800, 4102304400, 4102308000, 4102311600, 4102315200, 4102318800,
4102322400, 4102326000, 4102329600, 4102333200, 4102336800, 4102340400,
4102344000, 4102347600, 4102351200, 4102354800, 4102358400, 4102362000,
4102365600, 4102369200, 4102372800, 4102376400, 4102380000, 4102383600,
4102387200, 4102390800, 4102394400, 4102398000, 4102401600, 4102405200,
4102408800, 4102412400, 4102416000, 4102419600, 4102423200, 4102426800,
4102430400, 4102434000, 4102437600, 4102441200, 4102444800, 4102448400,
4102452000, 4102455600, 4102459200, 4102462800, 4102466400, 4102470000,
4102473600, 4102477200, 4102480800, 4102484400, 4102488000 ;
}
$ date --date='@4102488000' --utc # last time-step printed above ^
Fri 01 Jan 2100 12:00:00 PM UTC
This approach avoids using hours
, days
, etc. and follows a general standard of writing time-steps in a netCDF file. Hope this helps.
As noted this is a design decision to avoid time-steps exceeding the precision of the variable types. The units follow CF guidelines, that's the basis of what we try to implement by standard.
Regarding the issue of double-zero. That could relate to the starting time of the simulation, which is between days but isn't well represented in a daily output. My recommendation would be to change the simulation to start from 00:00 of a given day, and run to stop before 00:00 of the next day so each frame in your file is truly representative of a daily value.
In this case, start your simulation from 1980/01/02 00:00 and have it end on the last day that has a full 24-hours of record.
To capture the precision of the time within the day, create outputs of hourly or sub-daily increments using pts=
. The units of time will then capture the highest increment of seconds, hours, etc.., to capture the variable of time requiring the least precision to keep the size of data required in the file to be minimal.
Originally reported by @ShervanGharari:
The
time
dimension of NetCDF outputs ofMESH
(versions checked arer1860
andr1813
), has double0
in the first two time-steps. This hinders further usage ofMESH
outputs for other programs, as an examplemizuRoute
. Here is an example of thetime
dimension values produced usingMESH
for an example setup:This could be a quick fix, let me know if I can assist in submitting a
PR
for this.