MITgcm / xmitgcm

Read MITgcm mds binary files into xarray
http://xmitgcm.readthedocs.io
MIT License
56 stars 65 forks source link

Time read as iter number and not seconds in Matlab only #76

Closed mmazloff closed 6 years ago

mmazloff commented 6 years ago

Hello

I am writing variables to netcdf and finding that the time is being read in Matlab as the iteration number instead of time in seconds.

Here is what I do

from xmitgcm import open_mdsdataset
grid_dir = 'BIN2/'
data_dir = 'OUTPUT/'
ds = open_mdsdataset(data_dir, grid_dir, iters='all', delta_t=3600, ignore_unknown_vars=False, prefix=['diag_state'], ref_date="2008-1-1 0:0:0",geometry='sphericalpolar')

 print(ds)
<xarray.Dataset>
Dimensions:  (XC: 1080, XG: 1080, YC: 315, YG: 315, Z: 52, Zl: 52, Zp1: 53, Zu: 52, time: 60)
Coordinates:
    iter     (time) int64 dask.array<shape=(60,), chunksize=(1,)>
  * time     (time) datetime64[ns] 2008-01-31T10:00:00 2008-03-01T20:00:00 ...
  * XC       (XC) float32 0.166667 0.5 0.833333 1.16667 1.5 1.83333 2.16667 ...
  * YC       (YC) float32 -77.9652 -77.8956 -77.8255 -77.755 -77.6841 ...
...
    THETA    (time, Z, YC, XC) float32 dask.array<shape=(60, 52, 315, 1080), chunksize=(1, 52, 315, 1080)>

ds['THETA'].to_netcdf('Theta.nc')

Note time is correct here:

ds.time
<xarray.DataArray 'time' (time: 60)>
array(['2008-01-31T10:00:00.000000000', '2008-03-01T20:00:00.000000000',
       '2008-04-01T07:00:00.000000000', '2008-05-01T17:00:00.000000000',
       '2008-06-01T03:00:00.000000000', '2008-07-01T13:00:00.000000000',
       '2008-07-31T23:00:00.000000000', '2008-08-31T10:00:00.000000000',
       '2008-09-30T20:00:00.000000000', '2008-10-31T06:00:00.000000000',
...

But matlab shows ncdisp('Theta.nc') Source: Theta.nc Format: netcdf4 Global Attributes: _NCProperties = 'version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17' Dimensions: time = 609 XC = 1080 YC = 315 Z = 52 Variables: iter Size: 609x1 Dimensions: time Datatype: int64 Attributes: long_name = 'model timestep number' standard_name = 'timestep' time Size: 609x1 Dimensions: time Datatype: int64 Attributes: long_name = 'Time' standard_name = 'time' axis = 'T' units = 'days since 2008-01-04 00:00:00' calendar = 'proleptic_gregorian'

and when I read this time gives iter:

t = ncread('Theta.nc','time')'

t =

Columns 1 through 8

                0                  730                 1461                 2191          ....  

Any advice?

Thanks Matt

rabernat commented 6 years ago

Could you post the output of

ncdump -h

on the netcdf file that comes out of python?

On Mar 13, 2018, at 5:11 PM, Matt Mazloff notifications@github.com wrote:

Hello

I am writing variables to netcdf and finding that the time is being read in Matlab as the iteration number instead of time in seconds.

Here is what I do

from xmitgcm import open_mdsdataset grid_dir = 'BIN2/' data_dir = 'OUTPUT/' ds = open_mdsdataset(data_dir, grid_dir, iters='all', delta_t=3600, ignore_unknown_vars=False, prefix=['diag_state'], ref_date="2008-1-1 0:0:0",geometry='sphericalpolar')

print(ds)

Dimensions: (XC: 1080, XG: 1080, YC: 315, YG: 315, Z: 52, Zl: 52, Zp1: 53, Zu: 52, time: 60) Coordinates: iter (time) int64 dask.array time (time) datetime64[ns] 2008-01-31T10:00:00 2008-03-01T20:00:00 ... XC (XC) float32 0.166667 0.5 0.833333 1.16667 1.5 1.83333 2.16667 ... YC (YC) float32 -77.9652 -77.8956 -77.8255 -77.755 -77.6841 ... ... THETA (time, Z, YC, XC) float32 dask.array ds['THETA'].to_netcdf('Theta.nc') Note time is correct here: ds.time array(['2008-01-31T10:00:00.000000000', '2008-03-01T20:00:00.000000000', '2008-04-01T07:00:00.000000000', '2008-05-01T17:00:00.000000000', '2008-06-01T03:00:00.000000000', '2008-07-01T13:00:00.000000000', '2008-07-31T23:00:00.000000000', '2008-08-31T10:00:00.000000000', '2008-09-30T20:00:00.000000000', '2008-10-31T06:00:00.000000000', ... But matlab shows ncdisp('Theta.nc') Source: /data/SOSE/SOSE/SO3/ITER105/bsose_i105_2008to2012_3day_Theta.nc Format: netcdf4 Global Attributes: _NCProperties = 'version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17' Dimensions: time = 609 XC = 1080 YC = 315 Z = 52 Variables: iter Size: 609x1 Dimensions: time Datatype: int64 Attributes: long_name = 'model timestep number' standard_name = 'timestep' time Size: 609x1 Dimensions: time Datatype: int64 Attributes: long_name = 'Time' standard_name = 'time' axis = 'T' units = 'days since 2008-01-04 00:00:00' calendar = 'proleptic_gregorian' and when I read this time gives iter: t = ncread('Theta.nc','time')' t = Columns 1 through 8 0 730 1461 2191 .... Any advice? Thanks Matt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
mmazloff commented 6 years ago

ncdump -h Theta.nc netcdf Theta { dimensions: time = 60 ; XC = 1080 ; YC = 315 ; Z = 52 ; variables: int64 iter(time) ; iter:long_name = "model timestep number" ; iter:standard_name = "timestep" ; int64 time(time) ; time:long_name = "Time" ; time:standard_name = "time" ; time:axis = "T" ; time:units = "hours since 2008-01-31 10:00:00" ; time:calendar = "proleptic_gregorian" ; float XC(XC) ; XC:_FillValue = NaNf ; XC:coordinate = "YC XC" ; XC:units = "degrees_east" ; XC:standard_name = "longitude" ; XC:long_name = "longitude" ; XC:axis = "X" ; float YC(YC) ; YC:_FillValue = NaNf ; YC:coordinate = "YC XC" ; YC:units = "degrees_north" ; YC:standard_name = "latitude" ; YC:long_name = "latitude" ; YC:axis = "Y" ; float Z(Z) ; Z:_FillValue = NaNf ; Z:units = "m" ; Z:positive = "down" ; Z:standard_name = "depth" ; Z:long_name = "vertical coordinate of cell center" ; Z:axis = "Z" ; float Depth(YC, XC) ; Depth:_FillValue = NaNf ; Depth:coordinate = "XC YC" ; Depth:units = "m" ; Depth:standard_name = "ocean_depth" ; Depth:long_name = "ocean depth" ; float rA(YC, XC) ; rA:_FillValue = NaNf ; rA:coordinate = "YC XC" ; rA:units = "m2" ; rA:standard_name = "cell_area" ; rA:long_name = "cell area" ; float drF(Z) ; drF:_FillValue = NaNf ; drF:units = "m" ; drF:long_name = "cell z size" ; drF:standard_name = "cell_z_size" ; float hFacC(Z, YC, XC) ; hFacC:_FillValue = NaNf ; hFacC:long_name = "vertical fraction of open cell" ; hFacC:standard_name = "cell_vertical_fraction" ; float THETA(time, Z, YC, XC) ; THETA:_FillValue = NaNf ; THETA:units = "degC" ; THETA:long_name = "Potential Temperature" ; THETA:standard_name = "THETA" ; THETA:coordinates = "hFacC Depth rA iter drF" ;

// global attributes: :_NCProperties = "version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17" ; }

mmazloff commented 6 years ago

And I'll note ncdump -v time Theta.nc time = 0, 730, 1461, 2191, 2921, 3651, 4381, 5112, 5842, 6572, 7302, 8032, 8763, 9493, 10223, 10953, 11683, 12414, 13144, 13874, 14604, 15334, 16065, 16795, 17525, 18255, 18985, 19716, 20446, 21176, 21906, 22636, 23367, 24097, 24827, 25557, 26287, 27018, 27748, 28478, 29208, 29938, 30669, 31399, 32129, 32859, 33589, 34320, 35050, 35780, 36510, 37240, 37971, 38701, 39431, 40161, 40891, 41622, 42352, 43082 ;

So it really is writing this....

mmazloff commented 6 years ago

A clue! If I only read one record it does write time as time!

ds = open_mdsdataset(data_dir, grid_dir, iters='all', delta_t=3600, ignore_unknown_vars=False, prefix=['diag_state'], ref_date="2008-1-1 0:0:0",geometry='sphericalpolar') ds['THETA'].to_netcdf('/data/mmazloff/TEST.nc') time = 0, 730, 1461, 2191, 2921, 3651, 4381, 5112, 5842, 6572, 7302, 8032, 8763, 9493, 10223, 10953, 11683, 12414, 13144, 13874, 14604, 15334, 16065, 16795, 17525, 18255, 18985, 19716, 20446, 21176, 21906, 22636, 23367, 24097, 24827, 25557, 26287, 27018, 27748, 28478, 29208, 29938, 30669, 31399, 32129, 32859, 33589, 34320, 35050, 35780, 36510, 37240, 37971, 38701, 39431, 40161, 40891, 41622, 42352, 43082 ;

BUT

ds = open_mdsdataset(data_dir, grid_dir, iters=1460, delta_t=3600, ignore_unknown_vars=False, prefix=['diag_state'], ref_date="2008-1-1 0:0:0",geometry='sphericalpolar') ds['THETA'].to_netcdf('/data/mmazloff/TEST.nc') time = 5256000 ;

So iters='all' fails but iters=1460 works

rabernat commented 6 years ago

Matt, sorry for the slow response. FYI, this is fundamentally an xarray issue, not an xmitgcm issue.

The key clue is in the metadata attributes. Xarray, via netcdf4-python, is writing the time variable as

int64 time(time) ;
time:long_name = "Time" ;
time:standard_name = "time" ;
time:axis = "T" ;
time:units = "hours since 2008-01-31 10:00:00" ;
time:calendar = "proleptic_gregorian" ;

Note the value of time:units.

This is a perfectly valid time encoding according to CF conventions. The numbers are correct, the units are just not what you wish (seconds). If you were to-reopen the netcdf file using xarray, it would see the correct time.

Xarray's choices about encoding are described in the encoding documentation:

By default, xarray uses the ‘proleptic_gregorian’ calendar and units of the smallest time difference between values, with a reference time of the first time value.

In your case this is evidently hours.

The problem is that you are opening it in MATLAB but neglecting the units attribute.

So you have two choices

If you want to have different units for time, you can explicitly set the encoding when you call to_netcdf. For example:

encoding = {'time': {'units': 'seconds since 2008-01-01'}}
ds['Theta'].to_netcdf('Theta.nc', encoding=encoding)
mmazloff commented 6 years ago

Excellent - this: encoding = {'time': {'units': 'seconds since 2008-01-01'}} ds['Theta'].to_netcdf('Theta.nc', encoding=encoding) worked! Now its giving seconds using both ncdump and matlab's ncread

Thanks! Matt

rabernat commented 6 years ago

Glad we could sort it out.

You don't have to sign your github comments. We can see who you are! 😝