chronotope / chrono

Date and time library for Rust
Other
3.33k stars 532 forks source link

A TZ string can be ambiguous #1153

Open pitdicker opened 1 year ago

pitdicker commented 1 year ago

Example problem

Consider this (extended) POSIX TZ string: CRAZY5SHORT,M12.5.0/50,0/2

The transition dates work out to:

year dst to std std to dst
2022 2022-01-01 2:00 2022-12-27 2:00
2023 2023-01-01 2:00 2024-02-02 2:00
2024 2024-01-01 2:00 2024-12-31 2:00

Daylight saving time would start at December 27th, 2022. The transition to standard time is January 1st, 2023. The first transition after that is also from daylight saving time to standard time on January 1st, 2024!

What is the offset from UTC during most of 2023? → The TZ string is ambiguous, we can't tell.

Why can the time of day be more than 24 hours?

POSIX allows times from 00:00:00 to 24:59:59. RFC 8536 allows a TZ string in an TZif file to have times from -167:59:59 to 167:59:59. So up to a week before and after midnight of the transition date.

An example of when this can be needed to represent real-world transition rules comes from a man page:

Palestinian civil time, from 2012 onwards: EET-2EEST,M3.5.4/24,M9.3.6/145

2 hours ahead of UT in winter and 3 hours ahead in summer. Changes at the end (24:00 local time) of the last Thursday in March and 01:00 local time on the Friday following the third Saturday in September (that is, the Friday falling between September 21 and September 27 inclusive). The extended time-of-day "145", meaning 01:00 of the day six days after the nominal day, is only valid in the tzfile(5) variant of the System V syntax.

A year can end up with more than two transition dates

It is possible to write a date rule in 3 ways:

Two cases were it depends on the year whether a date falls in the current year or the next:

Problem 1: Our functions to map a local time to UTC don't expect to encounter more than two transition dates per year. But UTC to local time can handle it (I think?)

When can the transition dates switch order?

If the two dates (including time) are close together, within 1 week of each other.

That the time of the transition can be negative or more than 24 hours makes detecting a TZ string with this ambiguity extra difficult.

Problem 2: we assume a TZ string never causes ambiguous cases.

Possible solutions

Option 1: 'never look beyond the current year'.

This does not make all that much sense to me. What makes the year boundary so special? But as we are dealing with a non-sensical timezone specification, it is okay-ish to give bogus answers.

Option 2: detect weird TZ strings and return an return an error.

This is the approach in https://github.com/chronotope/chrono/pull/789. The validation in that PR is quite involved. An optimization there might be to resolve the two dates for some random year and test whether they are within a week of each other. Or more than 358 days apart (year boundary stuff :disappointed:).

Option 3: detect ambiguous cases during conversion.

During the conversion to/from local time we already have two transition dates for the current year. It is fast to check if they are less than a week apart, and then calculate the transition dates for the preceding or following year. Only if the datetime falls in an ambiguous period would this return LocalResult::Ambiguous.

This could work for local-to-utc, but we assume utc-to-local is never ambiguous. So not a solution.

Is this worth fixing?

At the moment this is just a dark, unspecified corner of chrono :smile:. It is used if the TZ environment variable sets a TZ string, or when it is included in the TZif file of the current timezone.

I am playing with the idea of exposing this functionality in a public type DstRule (or something like that). It would be a fourth choice besides Utc, Local and FixedOffset. I think it would be very useful for library users when writing unit tests to detect DST transition problems (and for us for the same reason). And having an easy way to specify timezones that are often good enough seems useful, especially on platforms that don't have Local or a timezone database.

Whatever we do, it should be consistent and mentioned in the documentation.

pitdicker commented 1 year ago

https://github.com/chronotope/chrono/pull/789 implements option 2, but in my opinion the desciption there doesn't do it justice.

cc @x-hgg-x You probably put a lot of thoughts into this already.

pitdicker commented 1 year ago

Option 4: detect ambiguous cases during conversion, return standard offset.

The desciption of a TZ String descibes a 'standard timezone' with offset, and an optional 'alternative timezone' with offset (which is used during daylight saving time).

I propose to detect ambiguous cases during conversion like option 3, and in the rare ambiguous cases to assume the 'standard timezone'.

To phrase it more clearly: 'when transitions cause the period in between them to be ambiguous, assume that period to be in standard time'.

x-hgg-x commented 1 year ago

Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:

https://github.com/chronotope/chrono/blob/38b19bbe4e21c402f81edfa2932a43831e679a35/src/offset/local/tz_info/rule.rs#L169

In https://github.com/chronotope/chrono/pull/789, I have implemented an exhaustive validation check for the extra rule of a timezone, so that the assumptions I made in the other parts of the code are upheld (see https://github.com/chronotope/chrono/pull/789/files#diff-92d44e11f46c889256447f824f3c9fb4964ba2f3092727e8b0a4251a4e236ff5R196-R198 for example).

x-hgg-x commented 1 year ago

I think it is better to check the timezone once when loading it, rather than doing the check each time we need to do an utc-to-local conversion.

pitdicker commented 1 year ago

@x-hgg-x Thank you for replying this quick!

Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:

But we don't check it yet when converting from local time to UTC.

pitdicker commented 1 year ago

Problem 3: a transition date falls in a gap created by another transition date.

A third way to make a mess with transition dates :innocent: : Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00. Transition date 2 has the same date, and a time right in the gap. In theory transition date 2 doesn't exist.

x-hgg-x commented 1 year ago

But we don't check it yet when converting from local time to UTC.

This differs between chrono and tz-rs.

When converting local time to UTC, since chrono doesn't keep the offset associated to the local time in the NaiveDateTime structure, it must scan the whole timezone to retrieve the UTC time, and so the resulting time can be ambiguous.

This is done in the TimeZoneRef::find_local_time_type_from_local() method, which corresponds to the find_date_time function in tz-rs. Since this method was written independently and was not taken from tz-rs like the other code in the tz_info module, it doesn't have any tests (unlike tz-rs), so I cannot guarantee that the implementation is correct.

x-hgg-x commented 1 year ago

A third way to make a mess with transition dates innocent : Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00. Transition date 2 has the same date, and a time right in the gap. In theory transition date 2 doesn't exist.

Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in tz-rs.

Note that we can still have this situation with normal transitions in a valid timezone, but since the transitions are specified with UTC timestamps, the corresponding UTC offset is never ambiguous if we know the time since epoch.

pitdicker commented 1 year ago

Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in tz-rs.

I am interested, do you know more cases?

x-hgg-x commented 1 year ago

I am interested, do you know more cases?

The invalid TZ strings are those who cause the transition dates to switch order for a particular year (your problem 2). On the contrary, if we can exclude them, all remaining TZ strings are valid.

This is why my validation check in #789 is complex, because it guarantees that a valid TZ string cannot switch transition dates order for any year.

pitdicker commented 1 year ago

The solution to problem 3 turns out to be simple: it goes away when you switch the two transition dates. And if you convert the transition dates to UTC before sorting them there is no problem at all.

I'll make this a test case at some point.

pitdicker commented 1 year ago

RFC 8536 has an interesting TZ string: EST5EDT,0/0,J365/25:

  • DST is considered to be in effect all year if it starts January 1 at 00:00 and ends December 31 at 24:00 plus the difference between daylight saving and standard time, leaving no room for standard time in the calendar.

Example: EST5EDT,0/0,J365/25 This represents a time zone that observes daylight saving time all year. It is 4 hours west of UT and is abbreviated "EDT".

If a country is in daylight daving time the whole year, how do you specify that? Like this, with the start of daylight saving time January 1 and the end December 31 at 24:00.

The time at the end date of the example is wrong however. POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.

In this example if your implementation is of the type 'never look beyond the current year', the whole year is in DST. If you take transition dates in adjacent years into account, almost the whole year would be in standard time: daylight saving time would start January 1 at 00:00, and end January 1 at 01:00, 25 hours after December 31 at 00:00 of the previous year. The rest of the current year would be in standard time.

x-hgg-x commented 1 year ago

The time at the end date of the example is wrong however. POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.

No, the example is correct. The "current local time, when the change to the other time is made" corresponds to the time before the transition.

Here are the transitions described in the example TZ string EST5EDT,0/0,J365/25 for two consecutive years:

We can see we spend no time in the EST UTC-5 timezone between the second transition of the year N and the first transition of the year N+1, so we have the EDT UTC-4 timezone all year.

This case works in chrono because we check for the previous and next year transitions: https://github.com/chronotope/chrono/blob/38b19bbe4e21c402f81edfa2932a43831e679a35/src/offset/local/tz_info/rule.rs#L169