CDAT / cdtime

climate calendar manipulation tools
0 stars 2 forks source link

cdtime bug on NERSC with 365 and 360 day calendars #46

Closed mfwehner closed 4 years ago

mfwehner commented 4 years ago

I have encountered a cdtime bug that seriously impacts analyses of daily data. Now perhaps the bug is only on the NERSC system, I don’t know. I tested it on an old installation that Hari did for me and on jupyter.nersc.gov which Charles probably made.

The script below demonstrates it. It should print out the calendar date of the first and last entry in a nc file, which for CMIP6 daily files is in the name of the file. They do not match for the 365 and 360 day calendars. But they do for the proleptic_gregorian calendar.

About half the CMIP6 models use a 365 day calendar. To test, pick any daily file in ESGF from a CESM model. To test the 360 day calendar, use a HadCM model.

Here is the script. import MV2 as MV, cdtime,os, cdms2 as cdms, sys, string var=sys.argv[1] f=cdms.open(sys.argv[2])

tim=f.dimensionarray('time') u=f.getdimensionunits('time')

cdtime.DefaultCalendar=cdtime.NoLeapCalendar tt=f.dimensionobject('time') if hasattr(tt, 'calendar'): if tt.calendar=='360_day':cdtime.DefaultCalendar=cdtime.Calendar360 if tt.calendar=='gregorian':cdtime.DefaultCalendar=cdtime.MixedCalendar if tt.calendar=='365_day':cdtime.DefaultCalendar=cdtime.NoLeapCalendar if tt.calendar=='noleap':cdtime.DefaultCalendar=cdtime.NoLeapCalendar if tt.calendar=='proleptic_gregorian':cdtime.DefaultCalendar=cdtime.GregorianCalendar if tt.calendar=='standard':cdtime.DefaultCalendar=cdtime.StandardCalendar

r = cdtime.reltime(tim[tim.shape[0]-1],u) last_day=r.tocomp() r = cdtime.reltime(tim[0],u) first_day=r.tocomp() print(first_day,last_day)

jasonb5 commented 4 years ago

I used the following source and observed this behavior.

http://esgf-data.ucar.edu/thredds/dodsC/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/amip/r2i1p1f1/Amon/pr/gn/v20190319/pr_Amon_CESM2_amip_r2i1p1f1_gn_195001-201412.nc

Observed behavior

1948-9-28 12:0:0.0 2013-8-12 12:0:0.0

Does changing tocomp() to tocomp(cdtime.DefaultCalendar) resolve the issue?

mfwehner commented 4 years ago

Jason Oddly enough it does.

But the real quantity that I want to get at is the range of indices to extract a season.

So for your file this set of commands does not work.

This should return the last value of time cdtime.comptime(2014,12,31).torel(u).value

This should return the first value of time cdtime.comptime(1950,1,1).torel(u).value

And these commands return the wrong year. time1=cdtime.reltime(tim3[0],u) time2=cdtime.reltime(time[time.shape[0]-1],u) y1=int(time1.torel('years since 1850-1-1').value)+1850 y2=int(time2.torel('years since 1850-1-1').value)+1850

So it would appear that the commands may not be picking up the value of cdtime.DefaultCalendar but instead are using cdtime.GregorianCalendar Which are different

cdtime.GregorianCalendar 4369L cdtime.DefaultCalendar 4113L cdtime.NoLeapCalendar 4113L

Thanks Michael

PS. (Note this last code fragment gets around the following bug (it used to work, but now crashes) y2=int(time2.torel('years since 0-1-1').value)

On May 29, 2020, at 1:06 PM, Jason Boutte notifications@github.com wrote:

I used the following source and observed this behavior.

http://esgf-data.ucar.edu/thredds/dodsC/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/amip/r2i1p1f1/Amon/pr/gn/v20190319/pr_Amon_CESM2_amip_r2i1p1f1_gn_195001-201412.nc http://esgf-data.ucar.edu/thredds/dodsC/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/amip/r2i1p1f1/Amon/pr/gn/v20190319/pr_Amon_CESM2_amip_r2i1p1f1_gn_195001-201412.nc Observed behavior

1948-9-28 12:0:0.0 2013-8-12 12:0:0.0 Does changing tocomp() to tocomp(cdtime.DefaultCalendar) resolve the issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CDAT/cdtime/issues/46#issuecomment-636168921, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSKCHEMTQHPMZ5JDQWL4V3RUAIVDANCNFSM4NNTEVMQ.

jasonb5 commented 4 years ago

Ok both those examples help confirm that cdtime functions are not using the DefaultCalendar. As soon as you explicitly pass the calendar, NoLeapCalendar in my example, you get the correct values.

I'll get into the code and figure out whats going on.

I think the reason y2=int(time2.torel('years since 0-1-1').value) doesn't work anymore can be explained here https://github.com/CDAT/cdms/issues/334, year 0 is no longer valid.

durack1 commented 4 years ago

@pochedls this thread may interest you, @mzelinka same for you

durack1 commented 4 years ago

@jasonb5 thanks for doing your homework and finding CDAT/cdms#334 in the process of fixing this issue, it'd be great to also implement the change described by @pochedls, as year = 0 is invalid following the CF (climate forecast) convention that the software is built on see http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch04s04.html

To be honest I would be amazed if cdms2 could deal with the paleo example on that page, with my particular query related to the calendar attribute:

Example 4.6. Paleoclimate time axis

double time(time) ;
  time:long_name = "time" ;
  time:units = "days since 1-1-1 0:0:0" ;
  time:calendar = "126 kyr B.P." ;
  time:month_lengths = 34, 31, 32, 30, 29, 27, 28, 28, 28, 32, 32, 34 ;
mfwehner commented 4 years ago

On May 29, 2020, at 2:16 PM, Jason Boutte notifications@github.com wrote:

I think the reason y2=int(time2.torel('years since 0-1-1').value)doesn't work anymore can be explained here CDAT/cdms#334 https://github.com/CDAT/cdms/issues/334, year 0 is no longer valid. Besides my codes, which I actually have a fix for, this one is important because a few cmip6 models actually use “days (hours) since 0-1-1" as their units.

jasonb5 commented 4 years ago

The c code is working correctly and using it's DefaultCalendar value.

The issue here is cdtime.DefaultCalendar=cdtime.NoLeapCalendar is actually setting the value in the python module and not the c module object. Which is why it has no effect. A reference to the c object is actually at cdtime._cdtime.

There's a couple paths forward. I think the preferred option would be the latter.

  1. Set the DefaultCalender on the c object. This would cause all subsequent calls to c functions to use the default calendar. Some of the python functions r2c, s2c, etc will actually override this becuase they use the DefaultCalender in the python module and pass it through to the c functions.
    cdtime._cdtime.DefaultCalendar = cdtime.NoLeapCalendar
  2. Explicilty pass the target calender to tocomp or any call that require a calendar.

Lets continue the year 0 conversation here. Could you descirbe the desired behavior and we can continue from there.

durack1 commented 4 years ago

@jasonb5 your suggestion RE: 2 above is exactly what I would recommend, in most CMIPx model cases the calendar is explicitly defined as an axis variable, so for e.g.:

$ ncdump -h /p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20180803/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc | grep time
    time = UNLIMITED ; // (1980 currently)
    double time(time) ;
        time:axis = "T" ;
        time:standard_name = "time" ;
        time:long_name = "Time axis" ;
        time:calendar = "gregorian" ;
        time:units = "days since 1850-01-01 00:00:00" ;
        time:time_origin = "1850-01-01 00:00:00" ;
        time:bounds = "time_bounds" ;
    double time_bounds(time, axis_nbounds) ;
    float tas(time, lat, lon) ;
        tas:cell_methods = "area: time: mean" ;
        :parent_time_units = "days since 1850-01-01 00:00:00" ;
        :branch_time_in_parent = 21914. ;
        :branch_time_in_child = 0. ;
jasonb5 commented 4 years ago

@mfwehner, @durack1

Just to clarify, setting the default calendar e.g. cdtime.DefaultCalendar = cdtime.NoLeapCalendar will never be able to produce the desired behavior. This is an inherit flaw from how cdtime is setup and there is no straight forward way to change this without making major changes to the structure of the package.

With that said here are a list of solutions that will produce the desired behavior.

  1. From @mfwehner's original example. You can retrieve the FileAxis for time with getAxis('time'). Any conversion called on this object should honor the calendar attribute if it's set, as mention by @durack1 in his previous comment.
    t = f.getAxis('time')
    comptime = t.asComponentTime()
    print(t.attributes['calendar'], comptime[0], comptime[-1])
    # noleap 1950-1-15 12:0:0.0 2014-12-15 12:0:0.0

If it's not set you can still pass it for conversion.

comptime = t.asComponentTime(cdtime.NoLeapCalendar)
  1. Modifying the DefaultCalendar on the c module object. I cannot guarantee any behavior when proceeding this way as cdtime was not intended to be used in this fashion.
    
    import cdtime._cdtime as cdtime
    cdtime.DefaultCalendar = cdtime.NoLeapCalendar

print(cdtime.reltime(711399.5, 'days since 0001-01-01 00:00:00').tocomp()) print(cdtime.reltime(735093.5, 'days since 0001-01-01 00:00:00').tocomp())

1950-1-15 12:0:0.0

2014-12-15 12:0:0.0


3. Explicitly setting the calendar when converting from relative to component time. The calendar if set on the `FileAxis` is accessible from the `tt` variable by calling `tt.getCalendar()` and can be passed to `tocomp`.
```python
r = cdtime.reltime(tim[tim.shape[0]-1],u)
last_day=r.tocomp(tt.getCalendar())
r = cdtime.reltime(tim[0],u)
first_day=r.tocomp(tt.getCalendar())
print(first_day,last_day)
# 1950-1-15 12:0:0.0 2014-12-15 12:0:0.0
durack1 commented 4 years ago

@jasonb5 moving forward if there are obvious limitations with cdtime, then would it be possible to resolve these with a reformulation of the C library? It is 23 years old after all. Synchronizing this with the latest calendars and UDUNITs would certainly be welcomed (not that I can imagine much has changed with time). Not sure it's on the roadmap?

I have always found my use of cdtime requires me to jump through hoops that aren't always obvious to a new user, and the documentation is pretty bare, so the trial and error time investment is something that would benefit greatly from a refresh (which could make usage much more straight forward, and documented)

jasonb5 commented 4 years ago

@durack1 I just realized I never mentioned this on the roadmap, i'll have to update it, but we are looking into refreshing a lot of the existing libraries. We want the redesign to provide an up to date, comprehensive and feature rich package.

I completely agree and understand the lack of documentation and moving forward with the redesign, this is one of our highest priorities.

durack1 commented 4 years ago

@jasonb5 excellent news, more than happy to be a guinea pig in the process, I am sure many of us would be very willing participants

jasonb5 commented 4 years ago

Closing as a work around has been provided.