SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
629 stars 283 forks source link

BUG: iris pp treats standard calendar as a proleptic gregorian and ignores real proleptic gregorian calendars #3561

Open cpelley opened 4 years ago

cpelley commented 4 years ago

https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/fileformats/pp_save_rules.py#L398-L404

Standard calendar cannot be treated as a Proleptic Gregorian for dates 1582-01-01 More context here: ancil/ticket #1159

UM F03 more recently clarified that the calendar it expects isn't Standard but a Proleptic Gregorian. Not only is iris pp save ignoring time coordinates with proleptic_gregorian calendars entirely, it also allows saving Standard dates prior to 1582-01-01 (as though they are proleptic gregorian dates when they are not).

github-actions[bot] commented 3 years ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

nhsavage commented 3 years ago

this is still important but I don't have any time to fix it myself

wjbenfold commented 2 years ago

If I understand this issue and the PP spec correctly, it isn't possible to save a PP file with a non-Proleptic Gregorian calendar. Possibly behaviours for Iris when asked to save a cube with a Standard calendar seems limited to

Are there other alternatives? Which of these would be preferred?

nhsavage commented 2 years ago

taking this back to basics. The problem is that the orginal UMDP F3 which was the basis of the pp load and save rules was somewhat lose in its teminology and referrred to the UM's calendar as a Standard calendar in one place and Proleptic Gregorian somewhere else. On closer inspection, I determined that it is in fact Proleptic Greogrian. However, the pp load and save rules are treating it as Standard

This means there is a symetrical error here: https://github.com/SciTools/iris/blob/main/lib/iris/fileformats/pp.py#L1064

I thefore think there are actually two changes needed here which should cancel out in all round trip pp tests:

in pp.py change

calendar = cf_units.CALENDAR_STANDARD to calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

nhsavage commented 2 years ago

assuming a round trip with pp data is unchanged, then there are two other cases:

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results
  2. loading from other file formats which state that the data has Standard calendar - this is the hard one. a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.
nhsavage commented 2 years ago

What I could probably do is make the above changes (except the conversion of standard calendar) and then see which, if any, of the tests break

wjbenfold commented 2 years ago

https://github.com/SciTools/cf-units/issues/203 currently needs parallel changes in Iris, which seem quite relevant here - notably that pp files don't handle cf's "standard" calendar

wjbenfold commented 2 years ago
  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

  1. loading from other file formats which state that the data is Standard calendar - this is the hard one. a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.

I'd be happy converting to the right calendar and then doing the save without asking the user. I think this is breaking in the same way as the other effect - it invalidates workarounds / changes results that should be reproducible.

rcomer commented 2 years ago

Sorry I’ve not read this thread in detail, but https://github.com/SciTools/cf-units/issues/185 seems relevant, and I understand @larsbarring already has a branch for it.

rcomer commented 2 years ago
  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

I think this will break more than just saving to other formats. For example, if you plot a time series and your calendar is "standard", the values are converted to datetime.datetime and matplotlib handles them out of the box. For any other calendar, we rely on nc-time-axis, where features such as date-locators and -formatters are (probably?) not as mature. So we risk breaking a lot of plotting code. https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/plot.py#L590-L608

For many (most?) UM applications, the data won't go back far enough for the standard/proleptic_gregorian distinction to be meaningful, so I think it would be good to permanently retain the option of cubes loaded from PP having the "standard" calendar.

nhsavage commented 2 years ago

it sounds like this is an issue we will have to live with. Trying to make this backward compatible seems very hard but there is no consensus to make it a breaking change, so we will have to live with it. I propose to close this issue as "won't be fixed"

rcomer commented 2 years ago

As I understand it, we are only just now starting to have climate configurations with a proper calendar, whereas historically they were 360_day. So the issue could start to become a genuine problem where it wasn't before (though I'm not sure how far back even the climate experiments go).

wjbenfold commented 2 years ago

We make breaking changes sometimes, and with good reasons. This will be a useful place to start if in future it becomes more important to match the actual meaning of pp dates.

In the meantime, would a warning be helpful, so that users this is affecting have a pointer towards where the bug is coming from in their code that assumes Iris does this correctly?

nhsavage commented 2 years ago

No one ever looks at warnings.

rcomer commented 2 years ago

I've just learned something...

from cf_units import Unit

u = Unit("days since 1970-01-01", calendar="standard")
gap_dates = range(-141430, -141420)
print(u.num2date(gap_dates))

shows Julian/Gregorian gap as expected

[cftime.DatetimeGregorian(1582, 10, 2, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 3, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 4, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 15, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 16, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 17, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 18, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 19, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 20, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 21, 0, 0, 0, 0, has_year_zero=False)]

but

print(u.num2pydate(gap_dates))

shows no gap, because the datetime library is actually Proleptic Gregorian.

datetime assumes the current Gregorian calendar extended in both directions

[real_datetime(1582, 10, 12, 0, 0) real_datetime(1582, 10, 13, 0, 0)
real_datetime(1582, 10, 14, 0, 0) real_datetime(1582, 10, 15, 0, 0)
real_datetime(1582, 10, 16, 0, 0) real_datetime(1582, 10, 17, 0, 0)
real_datetime(1582, 10, 18, 0, 0) real_datetime(1582, 10, 19, 0, 0)
real_datetime(1582, 10, 20, 0, 0) real_datetime(1582, 10, 21, 0, 0)]

which rather points to the plotting code being wrong. Possibly there are other implications but my brain hasn't got there yet...

rcomer commented 1 year ago

I wonder if we could actually have our cake and eat it here. Noting that

So, once we have converted our pp time headers into a coordinate with unit "hours since 1970-01-01 00:00:00", it is somewhat arbitrary whether we label it "proleptic_gregorian" or "standard". Would it therefore be reasonable to use the Proleptic Gregorian calendar to do the conversion from pp-header to coordinate, but still label the coordinate with the "standard" calendar? It would be a bit of a fudge, but would mean any change in behaviour would be limited to dates before 15th October 1582, and therefore a bugfix.

Based on https://github.com/SciTools/iris/issues/3561#issuecomment-1050997981, pp-saving could use num2pydate instead of num2date. Then the saved dates would be Proleptic Gregorian regardless of which real-world calendar was on the coordinate.

nhsavage commented 1 year ago

I have to admit, I am now getting a bit lost with all of this. It came about because I needed to convert some netCDF data from the ACCESS model to ancils. That model stored the date with a time of "days since 1-1-1 0:00" or similar and when it was saved to ancil, the time was shifted. I worked around this by converting the time units to be post 1582. I am not sure if this fix would still be needed with your proposed cake fix?

rcomer commented 1 year ago

Hmmm. It looks like num2pydate won't even work if you have "days since 1-1-1 0:00"

import cf_units
tunit = cf_units.Unit("days since 1-1-1 0:00")
tunit.num2pydate(42)
--> 573 dates = cftime.num2date(time_values, **num2date_kwargs)
    574 try:
    575     # We can assume all or none of the dates have a microsecond attribute
    576     microseconds = np.array([d.microsecond if d else 0 for d in dates])

File src/cftime/_cftime.pyx:504, in cftime._cftime.num2date()

ValueError: illegal calendar or reference date for python datetime

But since cf_units v3.1 we can do

new_unit = tunit.change_calendar("proleptic_gregorian")
print(new_unit.num2date(0))
0000-12-30 00:00:00

So maybe pp-saving should enforce a Proleptic Gregorian calendar before doing its conversions to headers.

I need more caffeine...

rcomer commented 1 year ago

Noting that the calendar previously called "gregorian" in the cf-conventions is now called "standard", I've updated the comments through this issue to hopefully reduce confusion.

trexfeathers commented 1 year ago

This no longer needs a team discussion of the principles, it needs a concrete proposal for people to discuss. So something to assign as a project for someone in future. If we can't find someone to assign, I will close this (unless it gets enough votes).

nhsavage commented 1 year ago

proposal

in pp.py change

calendar = cf_units.CALENDAR_STANDARD to calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

then we have to decide how to handle files in a standard a calendar

option 1: error option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582

nhsavage commented 1 year ago

option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582 I think this is the best solution here.

rcomer commented 1 year ago

I have stuck up a PR based on my comments above at #5138

github-actions[bot] commented 3 months ago

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.