Open pitdicker opened 1 year ago
https://github.com/chronotope/chrono/pull/789 implements option 2, but in my opinion the desciption there doesn't do it justice.
cc @x-hgg-x You probably put a lot of thoughts into this already.
The desciption of a TZ String descibes a 'standard timezone' with offset, and an optional 'alternative timezone' with offset (which is used during daylight saving time).
I propose to detect ambiguous cases during conversion like option 3, and in the rare ambiguous cases to assume the 'standard timezone'.
To phrase it more clearly: 'when transitions cause the period in between them to be ambiguous, assume that period to be in standard time'.
Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:
In https://github.com/chronotope/chrono/pull/789, I have implemented an exhaustive validation check for the extra rule of a timezone, so that the assumptions I made in the other parts of the code are upheld (see https://github.com/chronotope/chrono/pull/789/files#diff-92d44e11f46c889256447f824f3c9fb4964ba2f3092727e8b0a4251a4e236ff5R196-R198 for example).
I think it is better to check the timezone once when loading it, rather than doing the check each time we need to do an utc-to-local conversion.
@x-hgg-x Thank you for replying this quick!
Problem 1 is ok if the transition dates never switch order, since we also check previous and next year transitions when converting UTC to local time:
But we don't check it yet when converting from local time to UTC.
Problem 3: a transition date falls in a gap created by another transition date.
A third way to make a mess with transition dates :innocent: : Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00. Transition date 2 has the same date, and a time right in the gap. In theory transition date 2 doesn't exist.
But we don't check it yet when converting from local time to UTC.
This differs between chrono
and tz-rs
.
When converting local time to UTC, since chrono
doesn't keep the offset associated to the local time in the NaiveDateTime
structure, it must scan the whole timezone to retrieve the UTC time, and so the resulting time can be ambiguous.
This is done in the TimeZoneRef::find_local_time_type_from_local()
method, which corresponds to the find_date_time
function in tz-rs
. Since this method was written independently and was not taken from tz-rs
like the other code in the tz_info
module, it doesn't have any tests (unlike tz-rs
), so I cannot guarantee that the implementation is correct.
A third way to make a mess with transition dates innocent : Transition date 1 creates a gap in local time, for example by jumping the offset from UTC from +2:00 to +3:00. Transition date 2 has the same date, and a time right in the gap. In theory transition date 2 doesn't exist.
Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in tz-rs
.
Note that we can still have this situation with normal transitions in a valid timezone, but since the transitions are specified with UTC timestamps, the corresponding UTC offset is never ambiguous if we know the time since epoch.
Yes there are many cases where the TZ string doesn't make any sense. This is why I chose to invalidate the timezone in these cases in
tz-rs
.
I am interested, do you know more cases?
I am interested, do you know more cases?
The invalid TZ strings are those who cause the transition dates to switch order for a particular year (your problem 2). On the contrary, if we can exclude them, all remaining TZ strings are valid.
This is why my validation check in #789 is complex, because it guarantees that a valid TZ string cannot switch transition dates order for any year.
The solution to problem 3 turns out to be simple: it goes away when you switch the two transition dates. And if you convert the transition dates to UTC before sorting them there is no problem at all.
I'll make this a test case at some point.
RFC 8536 has an interesting TZ string: EST5EDT,0/0,J365/25
:
- DST is considered to be in effect all year if it starts January 1 at 00:00 and ends December 31 at 24:00 plus the difference between daylight saving and standard time, leaving no room for standard time in the calendar.
Example: EST5EDT,0/0,J365/25 This represents a time zone that observes daylight saving time all year. It is 4 hours west of UT and is abbreviated "EDT".
If a country is in daylight daving time the whole year, how do you specify that? Like this, with the start of daylight saving time January 1 and the end December 31 at 24:00.
The time at the end date of the example is wrong however. POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.
In this example if your implementation is of the type 'never look beyond the current year', the whole year is in DST. If you take transition dates in adjacent years into account, almost the whole year would be in standard time: daylight saving time would start January 1 at 00:00, and end January 1 at 01:00, 25 hours after December 31 at 00:00 of the previous year. The rest of the current year would be in standard time.
The time at the end date of the example is wrong however. POSIX says "Each time field describes when, in current local time, the change to the other time is made." So the time should not be "24:00 plus the difference between daylight saving and standard time", but just 24 hours.
No, the example is correct. The "current local time, when the change to the other time is made" corresponds to the time before the transition.
Here are the transitions described in the example TZ string EST5EDT,0/0,J365/25
for two consecutive years:
EST UTC-5
to EDT UTC-4
at Year N, January 1 00:00, EST UTC-5
.EDT UTC-4
to EST UTC-5
at Year N, December 31 25:00, EDT UTC-4
, corresponding to Year N+1, January 1 01:00, EDT UTC-4
or Year N+1, January 1 00:00, EST UTC-5
.EST UTC-5
to EDT UTC-4
at Year N+1, January 1 00:00, EST UTC-5
.EDT UTC-4
to EST UTC-5
at Year N+1, December 31 25:00, EDT UTC-4
, corresponding to Year N+2, January 1 01:00, EDT UTC-4
or Year N+2, January 1 00:00, EST UTC-5
.We can see we spend no time in the EST UTC-5
timezone between the second transition of the year N and the first transition of the year N+1, so we have the EDT UTC-4
timezone all year.
This case works in chrono
because we check for the previous and next year transitions: https://github.com/chronotope/chrono/blob/38b19bbe4e21c402f81edfa2932a43831e679a35/src/offset/local/tz_info/rule.rs#L169
Example problem
Consider this (extended) POSIX TZ string:
CRAZY5SHORT,M12.5.0/50,0/2
CRAZY
.-05:00
SHORT
.-04:00
5
) Sunday (0
) of December (12
).50
hours later, so two days later at02:00
.0
of the year, January 1st. The time is02:00
.The transition dates work out to:
Daylight saving time would start at December 27th, 2022. The transition to standard time is January 1st, 2023. The first transition after that is also from daylight saving time to standard time on January 1st, 2024!
What is the offset from UTC during most of 2023? → The TZ string is ambiguous, we can't tell.
Why can the time of day be more than 24 hours?
POSIX allows times from 00:00:00 to 24:59:59. RFC 8536 allows a TZ string in an TZif file to have times from -167:59:59 to 167:59:59. So up to a week before and after midnight of the transition date.
An example of when this can be needed to represent real-world transition rules comes from a man page:
A year can end up with more than two transition dates
It is possible to write a date rule in 3 ways:
Jn
: Ordinal which skips February 29th (a 'Julian day').n
: Ordinal in the range[0, 365]
(a 'zero-based Julian day').Mm.n.d
: Month, day of the week, and n'th occurence of that day in the month (wheren = 5
means the last day of the week that is in that month).Two cases were it depends on the year whether a date falls in the current year or the next:
365
maps to December 31st in leap years, and in non-leap years to Januari 1st of the next year.M12.5.d
will result in a date in the last week of december, potentially on the last day. Combined with a time>= 24
hours (allowed by POSIX) may push the date to the next year.Problem 1: Our functions to map a local time to UTC don't expect to encounter more than two transition dates per year. But UTC to local time can handle it (I think?)
When can the transition dates switch order?
If the two dates (including time) are close together, within 1 week of each other.
Mm.n.d
. It can jump resolve to 7 different dates. If the other date is a fixed ordinal it is easy to make a TZ string where the dates switch order depending on the year.That the time of the transition can be negative or more than 24 hours makes detecting a TZ string with this ambiguity extra difficult.
Problem 2: we assume a TZ string never causes ambiguous cases.
Possible solutions
Option 1: 'never look beyond the current year'.
This does not make all that much sense to me. What makes the year boundary so special? But as we are dealing with a non-sensical timezone specification, it is okay-ish to give bogus answers.
Option 2: detect weird TZ strings and return an return an error.
This is the approach in https://github.com/chronotope/chrono/pull/789. The validation in that PR is quite involved. An optimization there might be to resolve the two dates for some random year and test whether they are within a week of each other. Or more than 358 days apart (year boundary stuff :disappointed:).
Option 3: detect ambiguous cases during conversion.
During the conversion to/from local time we already have two transition dates for the current year. It is fast to check if they are less than a week apart, and then calculate the transition dates for the preceding or following year. Only if the datetime falls in an ambiguous period would this return
LocalResult::Ambiguous
.This could work for local-to-utc, but we assume utc-to-local is never ambiguous. So not a solution.
Is this worth fixing?
At the moment this is just a dark, unspecified corner of chrono :smile:. It is used if the
TZ
environment variable sets a TZ string, or when it is included in the TZif file of the current timezone.I am playing with the idea of exposing this functionality in a public type
DstRule
(or something like that). It would be a fourth choice besidesUtc
,Local
andFixedOffset
. I think it would be very useful for library users when writing unit tests to detect DST transition problems (and for us for the same reason). And having an easy way to specify timezones that are often good enough seems useful, especially on platforms that don't haveLocal
or a timezone database.Whatever we do, it should be consistent and mentioned in the documentation.