collective / icalendar

icalendar parser library for Python
https://icalendar.readthedocs.io/en/latest/
Other
958 stars 165 forks source link

[BUG] zoneinfo lost `localize` DST logic #649

Open thet opened 1 week ago

thet commented 1 week ago

I just recognized that the icalendar's reimplementation of the localize function is most likely failing within the daylight saving change period.

It is simply replacing the naive's zoneinfo with our zoneinfo providing the correct timezone information.

However, within daylight saving time (DST) periods, we need code like this:

https://github.com/stub42/pytz/blob/fb43f957c5149e750c3be3cfc72b22ad94db4886/src/pytz/tzinfo.py#L261

We could re-implement this.

niccokunzmann commented 6 days ago

Hi @thet, as I understood - I read it somewhere about pytz - pytz has a localize function because of the limitations that came with datetime at the time that pytz was created. From my memory: datetime at that time was only able to have static timezones. So what pytz did is using the localize and normalize functions to set the timezone depending on the datetime so that summer and winter time could be chosen. Since then, there have been changes made to the tzinfo and datetime which allow much better handling, like being left and right of a shift (e.g. 0:30 could be in summer time or in winter time, yielding 1h difference in UTC). From my understanding what zoneinfo does is handle it the now native way which was not there when pytz came to be. That is also one reason why we switch over: pytz is hard to handle because you would need to normalize after adding or subtracting a timedelta. zoneinfo does not require this.

Having said that, I got all off my mind and now, I wonder if I really understood the question. So, this is the code:

#pytz
>>> loc_dt1 = amdam.localize(dt, is_dst=True)
>>> loc_dt2 = amdam.localize(dt, is_dst=False)

This is the equivalent now, see https://docs.python.org/3/library/zoneinfo.html#using-zoneinfo

These time zones also support the fold attribute introduced in PEP 495. During offset transitions which induce ambiguous times (such as a daylight saving time to standard time transition), the offset from before the transition is used when fold=0, and the offset after the transition is used when fold=1, for example:

dt = datetime(2020, 11, 1, 1, tzinfo=ZoneInfo("America/Los_Angeles"))
print(dt)
print(dt.replace(fold=1))

This is should be equivalent functionality.

Questions that I have:

niccokunzmann commented 6 days ago

Will we get results that are off?

Yes.

Zoneinfo

>>> from datetime import datetime
>>> from zoneinfo import ZoneInfo
>>> from backports.zoneinfo import ZoneInfo
>>> dt = datetime(2020, 11, 1, 1, tzinfo=ZoneInfo("America/Los_Angeles")) # from example
>>> print(dt)
2020-11-01 01:00:00-07:00
>>> print(dt.replace(fold=1))
2020-11-01 01:00:00-08:00
>>> from icalendar import Event, vDatetime
>>> e = Event()
>>> dt.timestamp()
1604217600.0
>>> dt.replace(fold=1).timestamp()
1604221200.0
>>> dt.timestamp() - dt.replace(fold=1).timestamp()
-3600.0 # fold=1 is bigger -> later
>>> e = Event()
>>> e["dtstart"] = dt # wong way to do it
>>> e["dtend"] = dt.replace(fold=1) # wong way to do it
>>> print(e.to_ical().decode())
BEGIN:VEVENT
DTSTART:2020-11-01 01:00:00-07:00
DTEND:2020-11-01 01:00:00-08:00
END:VEVENT

>>> from icalendar import Event, vDatetime
>>> e = Event()
>>> e["dtstart"] = vDatetime(dt) # right way to do it
>>> e["dtend"] = vDatetime(dt.replace(fold=1)) # right way to do it
>>> print(e.to_ical().decode())
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20201101T010000
DTEND;TZID=America/Los_Angeles:20201101T010000
END:VEVENT

We can see that the fold is not correctly represented in the way we provide values. This must have slipped though the test cases.

pytz

Checking pytz... Yes, results are off.

>>> from pytz import timezone
>>> dt1 = timezone("America/Los_Angeles").localize(datetime(2020, 11, 1, 1), is_dst=False)
>>> dt2 = timezone("America/Los_Angeles").localize(datetime(2020, 11, 1, 1), is_dst=True)
>>> dt1.timestamp() - dt2.timestamp()
3600.0
>>> e = Event()
>>> e["dtstart"] = dt2 # wrong way
>>> e["dtend"] = dt1 # wrong way
>>> print(e.to_ical().decode())
BEGIN:VEVENT
DTSTART:2020-11-01 01:00:00-07:00
DTEND:2020-11-01 01:00:00-08:00
END:VEVENT

>>> e = Event()
>>> e["dtstart"] = vDatetime(dt2) # right way
>>> e["dtend"] = vDatetime(dt1) # right way
>>> print(e.to_ical().decode())
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20201101T010000
DTEND;TZID=America/Los_Angeles:20201101T010000
END:VEVENT

Thunderbird

I will see what that does. It is a bit buggy to create events around that time, see here: https://bugzilla.mozilla.org/show_bug.cgi?id=1904335 Calendar ICS, picture: test-calendar-dts-ameraka-los-angeles

This is what I get with Thunderbird:

>>> from icalendar import Calendar
>>> from urllib.request import urlopen
>>> d = urlopen("https://bugzilla.mozilla.org/attachment.cgi?id=9409200")
>>> c = Calendar.from_ical(d.read())
>>> for event in c.walk("VEVENT"):
...  name = event["summary"]
...  print(name, "dtstart", event["dtstart"])
...  print(name, "dtend", event["dtend"])
... 
Link with URLs dtstart vDDDTypes(2024-04-14 09:00:00+01:00, Parameters({'TZID': 'Europe/London'}))
Link with URLs dtend vDDDTypes(2024-04-14 10:15:00+01:00, Parameters({'TZID': 'Europe/London'}))
1 dtstart vDDDTypes(2020-10-31 23:45:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
1 dtend vDDDTypes(2020-10-31 23:45:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
2 dtstart vDDDTypes(2020-11-01 00:15:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
2 dtend vDDDTypes(2020-11-01 00:15:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
3 dtstart vDDDTypes(2020-10-31 23:45:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
3 dtend vDDDTypes(2020-11-01 01:15:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
4 dtstart vDDDTypes(2020-11-01 01:00:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
4 dtend vDDDTypes(2020-11-01 01:00:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
5 dtstart vDDDTypes(2020-11-01 01:15:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
5 dtend vDDDTypes(2020-11-01 01:15:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
6 dtstart vDDDTypes(2020-11-01 01:45:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
6 dtend vDDDTypes(2020-11-01 01:45:00-08:00, Parameters({'TZID': 'America/Los_Angeles'}))
7 dtstart vDDDTypes(2020-11-01 00:45:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
7 dtend vDDDTypes(2020-11-01 00:45:00-07:00, Parameters({'TZID': 'America/Los_Angeles'}))
8 - London dtstart vDDDTypes(2020-11-01 08:00:00+00:00, Parameters({'TZID': 'Europe/London'}))
8 - London dtend vDDDTypes(2020-11-01 09:00:00+00:00, Parameters({'TZID': 'Europe/London'}))

How to go on

It seems that this is not a new issue. The question to me is how the standard deals with this. Do you have an idea?

thet commented 6 days ago

hmmm. I need to read myself into that topic again. Thanks so far for your answer!

will look into that later.

niccokunzmann commented 6 days ago

Stackoverflow: https://stackoverflow.com/questions/68643703/is-there-a-way-in-icalendar-to-specify-an-event-in-the-second-hour-of-a-dst-over

RFC 5545:

If, based on the definition of the referenced time zone, the local time described occurs more than once (when changing from daylight to standard time), the DATE-TIME value refers to the first occurrence of the referenced time. Thus, TZID=America/ New_York:20071104T013000 indicates November 4, 2007 at 1:30 A.M. EDT (UTC-04:00). If the local time described does not occur (when changing from standard to daylight time), the DATE-TIME value is interpreted using the UTC offset before the gap in local times. Thus, TZID=America/New_York:20070311T023000 indicates March 11, 2007 at 3:30 A.M. EDT (UTC-04:00), one hour after 1:30 A.M. EST (UTC-05:00).

A time value MUST only specify the second 60 when specifying a positive leap second. For example:

 19970630T235960Z

Implementations that do not support leap seconds SHOULD interpret the second 60 as equivalent to the second 59.

It seams that that is considered as not possible to describe in the standard.