aidanheerdegen / publish_cosima_data

0 stars 1 forks source link

Minor issues in ocean dates #11

Closed AndyHoggANU closed 5 years ago

AndyHoggANU commented 5 years ago

I have noticed that splitvar saves dates in the ocean files with an offset of 2 days. I think. To see this, load a data file, any file, like this:

darray = xr.open_dataset('/g/data/ua8/cosima-tmp/publish/access-om2-025/ocean/mld/mld_access-om2-025_225612_225712.nc')

The time array on this looks OK, it is:

array(['2257-01-14T12:00:00.000000000', '2257-02-13T00:00:00.000000000',
       '2257-03-14T12:00:00.000000000', '2257-04-14T00:00:00.000000000',
       '2257-05-14T12:00:00.000000000', '2257-06-14T00:00:00.000000000',
       '2257-07-14T12:00:00.000000000', '2257-08-14T12:00:00.000000000',
       '2257-09-14T00:00:00.000000000', '2257-10-14T12:00:00.000000000',
       '2257-11-14T00:00:00.000000000', '2257-12-14T12:00:00.000000000'],
      dtype='datetime64[ns]')```
but the time_bounds, which I think is what is used to set the filenames, is

array([['2256-12-30T00:00:00.000000000', '2257-01-30T00:00:00.000000000'], ['2257-01-30T00:00:00.000000000', '2257-02-27T00:00:00.000000000'], ['2257-02-27T00:00:00.000000000', '2257-03-30T00:00:00.000000000'], ['2257-03-30T00:00:00.000000000', '2257-04-29T00:00:00.000000000'], ['2257-04-29T00:00:00.000000000', '2257-05-30T00:00:00.000000000'], ['2257-05-30T00:00:00.000000000', '2257-06-29T00:00:00.000000000'], ['2257-06-29T00:00:00.000000000', '2257-07-30T00:00:00.000000000'], ['2257-07-30T00:00:00.000000000', '2257-08-30T00:00:00.000000000'], ['2257-08-30T00:00:00.000000000', '2257-09-29T00:00:00.000000000'], ['2257-09-29T00:00:00.000000000', '2257-10-30T00:00:00.000000000'], ['2257-10-30T00:00:00.000000000', '2257-11-29T00:00:00.000000000'], ['2257-11-29T00:00:00.000000000', '2257-12-30T00:00:00.000000000']], dtype='datetime64[ns]')


So, it would appear that the calendar years end two days early. 

Is this in any way related to the bug that Russ found about  calendars in our IAF case?? 
aidanheerdegen commented 5 years ago

Yes, in this case it is the issue with using the incorrect calendar, gregorian, rather than proleptic_gregorian with time units of "days since 0001-01-01".

I have confirmed it works as it should if the data is input without decoding the times, the calendar changed to proleptic_gregorian and then the times decoded.

In a way this is a problem with the files themselves that needs to be changed. How many are affected like this?

aidanheerdegen commented 5 years ago

Ok, I've added a --calendar option to splitvar

https://github.com/coecms/splitvar/commit/d6960f7b7e4b96e68da74b315cddf2ce6e718b65

I can either add the option to the README scripts and push, or you can add and try it out. If the latter, add the following option

--calendar proleptic_gregorian

and give it a burl (I have updated splitvar in conda/analysis3-unstable)

AndyHoggANU commented 5 years ago

Hmmm ... the plot thickens. This works for the 3D variables. If you compare

darray = xr.open_dataset('/g/data/ua8/cosima-tmp/publish/access-om2-025/ocean/salt/salt_access-om2-025_219801_219901.nc')
darray.time_bounds

which gives

<xarray.DataArray 'time_bounds' (time: 1, nv: 2)>
array([['2198-01-01T00:00:00.000000000', '2199-01-01T00:00:00.000000000']],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2198-07-02T12:00:00
  * nv       (nv) float64 1.0 2.0
Attributes:
    long_name:  time axis boundaries

to

darray = xr.open_dataset('/g/data/ua8/cosima-tmp/publish/access-om2-025-old/ocean/salt/salt_access-om2-025_219712_219812.nc')
darray.time_bounds

which gives

<xarray.DataArray 'time_bounds' (time: 1, nv: 2)>
array([['2197-12-30T00:00:00.000000000', '2198-12-30T00:00:00.000000000']],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2198-06-30T12:00:00
  * nv       (nv) float64 1.0 2.0
Attributes:
    long_name:  time axis boundaries

you will see that the time bounds have been fixed.

Foncusingly, it doesn't seem to work with ocean_month.nc:

darray = xr.open_dataset('/g/data/ua8/cosima-tmp/publish/access-om2-025/ocean/mld/mld_access-om2-025_219801_219901.nc')
darray.time_bounds

gives

<xarray.DataArray 'time_bounds' (time: 12, nv: 2)>
array([['2256-12-30T00:00:00.000000000', '2257-01-30T00:00:00.000000000'],
       ['2257-01-30T00:00:00.000000000', '2257-02-27T00:00:00.000000000'],
       ['2257-02-27T00:00:00.000000000', '2257-03-30T00:00:00.000000000'],
       ['2257-03-30T00:00:00.000000000', '2257-04-29T00:00:00.000000000'],
       ['2257-04-29T00:00:00.000000000', '2257-05-30T00:00:00.000000000'],
       ['2257-05-30T00:00:00.000000000', '2257-06-29T00:00:00.000000000'],
       ['2257-06-29T00:00:00.000000000', '2257-07-30T00:00:00.000000000'],
       ['2257-07-30T00:00:00.000000000', '2257-08-30T00:00:00.000000000'],
       ['2257-08-30T00:00:00.000000000', '2257-09-29T00:00:00.000000000'],
       ['2257-09-29T00:00:00.000000000', '2257-10-30T00:00:00.000000000'],
       ['2257-10-30T00:00:00.000000000', '2257-11-29T00:00:00.000000000'],
       ['2257-11-29T00:00:00.000000000', '2257-12-30T00:00:00.000000000']],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2257-01-14T12:00:00 ... 2257-12-14T12:00:00
  * nv       (nv) float64 1.0 2.0
Attributes:
    long_name:  time axis boundaries

Which still has the two-day offset.??.

I have pushed my latest code for you to see or confirm you can reproduce...

aidanheerdegen commented 5 years ago

A couple of points:

  1. How does mld_access-om2-025_219801_219901.nc give dates like 2256-12-30T00:00:00.000000000?

  2. I can't reproduce. Tried this:

splitvar -cp -d title -d grid_type -d grid_tile -a ocean_grid.nc -o $OUTPATH --model-type ${SUBMODEL} --simname ${MODEL} --calendar proleptic_gregorian -v sea_level ${COSIMADIR}/${MODEL}/${EXPT}/output1[2-5]?/${SUBMODEL}/ocean_month.nc

got this:

>>> ds = xr.open_dataset('datadir/access-om2-025/ocean/sea-level/sea-level_access-om2-025_219801_219901.nc')
>>> ds.time_bounds
<xarray.DataArray 'time_bounds' (time: 12, nv: 2)>
array([['2198-01-01T00:00:00.000000000', '2198-02-01T00:00:00.000000000'],
       ['2198-02-01T00:00:00.000000000', '2198-03-01T00:00:00.000000000'],
       ['2198-03-01T00:00:00.000000000', '2198-04-01T00:00:00.000000000'],
       ['2198-04-01T00:00:00.000000000', '2198-05-01T00:00:00.000000000'],
       ['2198-05-01T00:00:00.000000000', '2198-06-01T00:00:00.000000000'],
       ['2198-06-01T00:00:00.000000000', '2198-07-01T00:00:00.000000000'],
       ['2198-07-01T00:00:00.000000000', '2198-08-01T00:00:00.000000000'],
       ['2198-08-01T00:00:00.000000000', '2198-09-01T00:00:00.000000000'],
       ['2198-09-01T00:00:00.000000000', '2198-10-01T00:00:00.000000000'],
       ['2198-10-01T00:00:00.000000000', '2198-11-01T00:00:00.000000000'],
       ['2198-11-01T00:00:00.000000000', '2198-12-01T00:00:00.000000000'],
       ['2198-12-01T00:00:00.000000000', '2199-01-01T00:00:00.000000000']],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2198-01-16T12:00:00 ... 2198-12-16T12:00:00
  * nv       (nv) float64 1.0 2.0
Attributes:
    long_name:  time axis boundaries
>>> 
AndyHoggANU commented 5 years ago

You are right - I must have read in the wrong file somehow. Sorry. It is now perfect.