Open cpelley opened 4 years ago
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
this is still important but I don't have any time to fix it myself
If I understand this issue and the PP spec correctly, it isn't possible to save a PP file with a non-Proleptic Gregorian calendar. Possibly behaviours for Iris when asked to save a cube with a Standard calendar seems limited to
Are there other alternatives? Which of these would be preferred?
taking this back to basics. The problem is that the orginal UMDP F3 which was the basis of the pp load and save rules was somewhat lose in its teminology and referrred to the UM's calendar as a Standard calendar in one place and Proleptic Gregorian somewhere else. On closer inspection, I determined that it is in fact Proleptic Greogrian. However, the pp load and save rules are treating it as Standard
This means there is a symetrical error here: https://github.com/SciTools/iris/blob/main/lib/iris/fileformats/pp.py#L1064
I thefore think there are actually two changes needed here which should cancel out in all round trip pp tests:
in pp.py change
calendar = cf_units.CALENDAR_STANDARD
to
calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN
and in
pp_save_rules.py
change
elif time_coord.units.calendar == 'standard':
to
elif time_coord.units.calendar == 'proleptic_gregorian':
assuming a round trip with pp data is unchanged, then there are two other cases:
What I could probably do is make the above changes (except the conversion of standard calendar) and then see which, if any, of the tests break
https://github.com/SciTools/cf-units/issues/203 currently needs parallel changes in Iris, which seem quite relevant here - notably that pp files don't handle cf's "standard" calendar
- loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results
I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)
- loading from other file formats which state that the data is Standard calendar - this is the hard one. a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.
I'd be happy converting to the right calendar and then doing the save without asking the user. I think this is breaking in the same way as the other effect - it invalidates workarounds / changes results that should be reproducible.
Sorry I’ve not read this thread in detail, but https://github.com/SciTools/cf-units/issues/185 seems relevant, and I understand @larsbarring already has a branch for it.
- loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results
I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)
I think this will break more than just saving to other formats. For example, if you plot a time series and your calendar is "standard", the values are converted to datetime.datetime
and matplotlib handles them out of the box. For any other calendar, we rely on nc-time-axis
, where features such as date-locators and -formatters are (probably?) not as mature. So we risk breaking a lot of plotting code.
https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/plot.py#L590-L608
For many (most?) UM applications, the data won't go back far enough for the standard/proleptic_gregorian distinction to be meaningful, so I think it would be good to permanently retain the option of cubes loaded from PP having the "standard" calendar.
it sounds like this is an issue we will have to live with. Trying to make this backward compatible seems very hard but there is no consensus to make it a breaking change, so we will have to live with it. I propose to close this issue as "won't be fixed"
As I understand it, we are only just now starting to have climate configurations with a proper calendar, whereas historically they were 360_day. So the issue could start to become a genuine problem where it wasn't before (though I'm not sure how far back even the climate experiments go).
We make breaking changes sometimes, and with good reasons. This will be a useful place to start if in future it becomes more important to match the actual meaning of pp dates.
In the meantime, would a warning be helpful, so that users this is affecting have a pointer towards where the bug is coming from in their code that assumes Iris does this correctly?
No one ever looks at warnings.
I've just learned something...
from cf_units import Unit
u = Unit("days since 1970-01-01", calendar="standard")
gap_dates = range(-141430, -141420)
print(u.num2date(gap_dates))
shows Julian/Gregorian gap as expected
[cftime.DatetimeGregorian(1582, 10, 2, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 3, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 4, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 15, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 16, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 17, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 18, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 19, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 20, 0, 0, 0, 0, has_year_zero=False)
cftime.DatetimeGregorian(1582, 10, 21, 0, 0, 0, 0, has_year_zero=False)]
but
print(u.num2pydate(gap_dates))
shows no gap, because the datetime
library is actually Proleptic Gregorian.
datetime assumes the current Gregorian calendar extended in both directions
[real_datetime(1582, 10, 12, 0, 0) real_datetime(1582, 10, 13, 0, 0) real_datetime(1582, 10, 14, 0, 0) real_datetime(1582, 10, 15, 0, 0) real_datetime(1582, 10, 16, 0, 0) real_datetime(1582, 10, 17, 0, 0) real_datetime(1582, 10, 18, 0, 0) real_datetime(1582, 10, 19, 0, 0) real_datetime(1582, 10, 20, 0, 0) real_datetime(1582, 10, 21, 0, 0)]
which rather points to the plotting code being wrong. Possibly there are other implications but my brain hasn't got there yet...
I wonder if we could actually have our cake and eat it here. Noting that
So, once we have converted our pp time headers into a coordinate with unit "hours since 1970-01-01 00:00:00", it is somewhat arbitrary whether we label it "proleptic_gregorian" or "standard". Would it therefore be reasonable to use the Proleptic Gregorian calendar to do the conversion from pp-header to coordinate, but still label the coordinate with the "standard" calendar? It would be a bit of a fudge, but would mean any change in behaviour would be limited to dates before 15th October 1582, and therefore a bugfix.
Based on https://github.com/SciTools/iris/issues/3561#issuecomment-1050997981, pp-saving could use num2pydate
instead of num2date
. Then the saved dates would be Proleptic Gregorian regardless of which real-world calendar was on the coordinate.
I have to admit, I am now getting a bit lost with all of this. It came about because I needed to convert some netCDF data from the ACCESS model to ancils. That model stored the date with a time of "days since 1-1-1 0:00" or similar and when it was saved to ancil, the time was shifted. I worked around this by converting the time units to be post 1582. I am not sure if this fix would still be needed with your proposed cake fix?
Hmmm. It looks like num2pydate
won't even work if you have "days since 1-1-1 0:00"
import cf_units
tunit = cf_units.Unit("days since 1-1-1 0:00")
tunit.num2pydate(42)
--> 573 dates = cftime.num2date(time_values, **num2date_kwargs)
574 try:
575 # We can assume all or none of the dates have a microsecond attribute
576 microseconds = np.array([d.microsecond if d else 0 for d in dates])
File src/cftime/_cftime.pyx:504, in cftime._cftime.num2date()
ValueError: illegal calendar or reference date for python datetime
But since cf_units v3.1 we can do
new_unit = tunit.change_calendar("proleptic_gregorian")
print(new_unit.num2date(0))
0000-12-30 00:00:00
So maybe pp-saving should enforce a Proleptic Gregorian calendar before doing its conversions to headers.
I need more caffeine...
Noting that the calendar previously called "gregorian" in the cf-conventions is now called "standard", I've updated the comments through this issue to hopefully reduce confusion.
This no longer needs a team discussion of the principles, it needs a concrete proposal for people to discuss. So something to assign as a project for someone in future. If we can't find someone to assign, I will close this (unless it gets enough votes).
proposal
in pp.py change
calendar = cf_units.CALENDAR_STANDARD to calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN
and in
pp_save_rules.py
change
elif time_coord.units.calendar == 'standard':
to
elif time_coord.units.calendar == 'proleptic_gregorian':
then we have to decide how to handle files in a standard a calendar
option 1: error option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582
option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582 I think this is the best solution here.
I have stuck up a PR based on my comments above at #5138
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/fileformats/pp_save_rules.py#L398-L404
Standard calendar cannot be treated as a Proleptic Gregorian for dates
1582-01-01
More context here: ancil/ticket #1159UM F03 more recently clarified that the calendar it expects isn't Standard but a Proleptic Gregorian. Not only is iris pp save ignoring time coordinates with
proleptic_gregorian
calendars entirely, it also allows saving Standard dates prior to1582-01-01
(as though they are proleptic gregorian dates when they are not).