MESH-Model / MESH-Dev

This repository contains the official MESH development code, which is the basis for the 'tags' listed under the MESH-Releases repository. The same tags are listed under this repository. Legacy branches and utilities have also been ported from the former SVN (Subversion) repository. Future developments must create 'forks' from this repository.
Other
2 stars 3 forks source link

`time` dimension of NetCDF outputs have double `0` values #26

Closed kasra-keshavarz closed 6 months ago

kasra-keshavarz commented 6 months ago

Originally reported by @ShervanGharari:

The time dimension of NetCDF outputs of MESH (versions checked are r1860 and r1813), has double 0 in the first two time-steps. This hinders further usage of MESH outputs for other programs, as an example mizuRoute. Here is an example of the time dimension values produced using MESH for an example setup:

foo@bar:~$ ncdump -v time RFF_D_GRD.nc 
netcdf RFF_D_GRD {
dimensions:
    subbasin = 197 ;
    time = UNLIMITED ; // (4 currently)
variables:
    int crs ;
        crs:grid_mapping_name = "latitude_longitude" ;
        crs:longitude_of_prime_meridian = 0.f ;
        crs:semi_major_axis = 6378137.f ;
        crs:inverse_flattening = 298.2572f ;
    int subbasin(subbasin) ;
        subbasin:_FillValue = -2147483647 ;
        subbasin:standard_name = "subbasin" ;
    float lon(subbasin) ;
        lon:_FillValue = 9.96921e+36f ;
        lon:standard_name = "lon" ;
    float lat(subbasin) ;
        lat:_FillValue = 9.96921e+36f ;
        lat:standard_name = "lat" ;
    int time(time) ;
        time:units = "days since 1980-01-01 07:00:00.000000" ;
        time:long_name = "time" ;
        time:standard_name = "time" ;
        time:axis = "T" ;
        time:calendar = "gregorian" ;
    float RFF(time, subbasin) ;
        RFF:_FillValue = -1.f ;
        RFF:standard_name = "RFF" ;
        RFF:coordinates = "lon lat" ;
        RFF:grid_mapping = "crs" ;

// global attributes:
        :title = "SA_MESH model outputs" ;
        :source = "SA_MESH" ;
        :history = "2023-12-15 12:01:00.000000 - Created." ;
        :references = "SA_MESH" ;
        :Conventions = "CF-1.6" ;
data:

 time = 0, 0, 1, 2 ;    <<<<<<<< PROBLEM
}

This could be a quick fix, let me know if I can assist in submitting a PR for this.

mee067 commented 6 months ago

I do not seem to suffer from such a problem. I checked both the Yukon vector and grid-based setups and the MRB grid-based setup.

I think the issue may have to do with the start time of the simulation being at hour 7:00 instead of 0.00. I suspect you are using RDRS forcing which starts at 13:00 and with the time zone shift it became 7:00.

I remember seeing this confusion before in some of MESH outputs. Partial days cause issues when calculating daily output and Dan did fix a related issue for maybe streamflow (before moving to Github).

What does the acronym PRmean?

mee067 commented 6 months ago

I think I can explain what's happening. When you have the start time at 7:00, the day is 0, when it is 0:00 on the next day, not a full day has passed so the day is still 0, then the counter flips with the 3rd day. This will happen for daily output. It could also occur for monthly or annual output if we start in the middle of the month/year. I think (or hope) it still calculates things correctly. The daily mean will be based on the number of hours of simulation done for that day - even if less than 24, same for partial month/year (and partial month/year at the end of simulation). This may need verification though.

The fix could be to set the initial time of the time axis to 00:00 irrespective of the simulation start time, similarly for the month/year for monthly/annual output (use the 1st day of the month irrespective of the day we start and similar for the year). We also have seasonal output so a similar fix would be needed - it is annual but the start of the year is not Jan; it is taken as the first months of simulation - useful for hydrological years but can be handled via post-processing of monthly output.

For partial days/months/years at the end of the simulation, I implemented a fix to force MESH to write the output. Sometimes, it creates an additional empty field (e.g. when the simulation end time is 2000 01 00 which we use to ensure 1999 is run to the end, this adds zero values for 2000 in an annual file. The year counter flips but there are no results.

kasra-keshavarz commented 6 months ago

This only can cause issues when the output of MESH is going to be read with other software, and double 0 values can potentially hinder usage.

I feel playing with the time:units can minimize the headache; either using units such as hours, days, or years depending on the time-scale of the output file or, switching altogether to seconds and following UNIX EPOCH standard. Just an idea.

P.S. pr is for pull request.

mee067 commented 6 months ago

What I have seen is: hourly output gets units in 'hours since simulation start hour', daily & monthly get units in 'days since sim start day', and annual gets units in 'years since sim start year'. The time variable is int in all cases. So what you suggest is already implemented.

What's needed is a special treatment for partial days, months, or years. It can be automated via rounding down (truncating the partial time step) and then adding 1 to the time. Or truncating down when getting the start time to use for the units - this will make it always start at the beginning of the day/month/year. It won't harm if the counter starts at 7. I think the latter is what I hinted at in my previous comment.

mee067 commented 6 months ago

7 hours since mid-night is the same as 0 hours since 7:00am

kasra-keshavarz commented 6 months ago

UNIX EPOCH would eliminate the need for special treatments. The time-steps are expressed in seconds, but the actual values of time could be a representation of hours, days, years, etc.

mee067 commented 6 months ago

but you may need to use float for time instead of integer. A 100-year simulation will go beyond 3,153,600,000 which is out of the range for the default 4-byte integer.

mee067 commented 6 months ago

and with floats, you may get inaccuracies for the time axis - a few seconds or minutes (if accumulated) can push a date to a different month or year - maybe not a problem but could be.

mee067 commented 6 months ago

alternatively we could use 8-byte integers.

kasra-keshavarz commented 6 months ago

So, here is a test of how to create datetime values in UNIX EPOCH in a sample netCDF file:

import xarray as xr
import numpy as np

# Create a CFTimeIndex with time stamps going hourly from 1950-01-01T12:00:00 to 2100-01-01T12:00:00
time_range = xr.cftime_range(start='1950-01-01T12:00:00', end='2100-01-01T12:00:00', freq='H', calendar='gregorian')

# Create a dataset with a single variable 'pr' and time dimension
data = np.ones(len(time_range))  # Set all values to 1 for the 'pr' variable
ds = xr.Dataset({'pr': (['time'], data)}, coords={'time': time_range})

# Print the created dataset
print(ds)
<xarray.Dataset>
Dimensions:  (time: 1314889)
Coordinates:
  * time     (time) object 1950-01-01 12:00:00 ... 2100-01-01 12:00:00
Data variables:
    pr       (time) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0
ds.to_netcdf('./test.nc', encoding={"time": {"units": "seconds since 1970-01-01 00:00:00"}})

And, reading the values using ncdump:

$ ncdump -h test.nc 
netcdf test {
dimensions:
        time = 1314889 ;
variables:
        double pr(time) ;
                pr:_FillValue = NaN ;
        int64 time(time) ;
                time:units = "seconds since 1970-01-01" ;
                time:calendar = "standard" ;
}

$ ncdump -v time test.nc | tail 
    4102300800, 4102304400, 4102308000, 4102311600, 4102315200, 4102318800, 
    4102322400, 4102326000, 4102329600, 4102333200, 4102336800, 4102340400, 
    4102344000, 4102347600, 4102351200, 4102354800, 4102358400, 4102362000, 
    4102365600, 4102369200, 4102372800, 4102376400, 4102380000, 4102383600, 
    4102387200, 4102390800, 4102394400, 4102398000, 4102401600, 4102405200, 
    4102408800, 4102412400, 4102416000, 4102419600, 4102423200, 4102426800, 
    4102430400, 4102434000, 4102437600, 4102441200, 4102444800, 4102448400, 
    4102452000, 4102455600, 4102459200, 4102462800, 4102466400, 4102470000, 
    4102473600, 4102477200, 4102480800, 4102484400, 4102488000 ;
}

$ date --date='@4102488000' --utc # last time-step printed above ^
Fri 01 Jan 2100 12:00:00 PM UTC

This approach avoids using hours, days, etc. and follows a general standard of writing time-steps in a netCDF file. Hope this helps.

dprincz commented 6 months ago

As noted this is a design decision to avoid time-steps exceeding the precision of the variable types. The units follow CF guidelines, that's the basis of what we try to implement by standard.

dprincz commented 6 months ago

Regarding the issue of double-zero. That could relate to the starting time of the simulation, which is between days but isn't well represented in a daily output. My recommendation would be to change the simulation to start from 00:00 of a given day, and run to stop before 00:00 of the next day so each frame in your file is truly representative of a daily value.

In this case, start your simulation from 1980/01/02 00:00 and have it end on the last day that has a full 24-hours of record.

dprincz commented 6 months ago

To capture the precision of the time within the day, create outputs of hourly or sub-daily increments using pts=. The units of time will then capture the highest increment of seconds, hours, etc.., to capture the variable of time requiring the least precision to keep the size of data required in the file to be minimal.