Closed robpike closed 8 months ago
- If
Parse
orLoadLocation
is given a location/time zone that is not loadable, instead of incorrectly using a zero offset, we look up the name in that table. If it exists, we build a nonceLocation
with the provided name and fixed offset.
An alternative would be:
Add a new function LoadFixedZone(abbrev string) (*Location, error)
that returns time.FixedZone(abbrev, offset)
with the determined offset from that table.
LoadLocation
continues to return what it returns today.
Parse
attempts to use the new LoadFixedZone
when encountering a zone string, as it's the most specific option and corresponds to what is produced by Time.Format
. If that fails, it falls back to using LoadLocation
.
This approach keeps the concepts of 'time zone location' and 'time zone' separate within the API, at least at the function level. In the IETF database, some strings (such as "CET") serve as both a time zone location and a time zone. By providing two functions, users can decide whether they want to look up "CET" as a time zone location or "CET" as an actual time zone.
I'm concerned about the ambiguities. For example, CST
is both +0800 (China Standard Time) and -0600 (Central Standard Time in the U.S.). PST
is both -0800 (Pacific Standard Time in the U.S.) and +0800 (Philippine Standard Time).
Running a little program over the timezone database turns up these duplicate identifiers:
AMT appears in [Europe/Athens Africa/Asmara America/Asuncion Europe/Amsterdam]
APT appears in [US/Alaska Canada/Atlantic]
AST appears in [US/Alaska Canada/Atlantic]
AWT appears in [US/Alaska Canada/Atlantic]
BMT appears in [Europe/Brussels Europe/Tiraspol Europe/Zurich Africa/Banjul America/Bogota Asia/Baghdad Asia/Bangkok Asia/Jakarta Atlantic/Bermuda]
BST appears in [US/Aleutian America/La_Paz Atlantic/Bermuda GB-Eire]
CDT appears in [US/Indiana-Starke Cuba ROC]
CMT appears in [America/Rosario America/Caracas America/Panama America/La_Paz America/St_Lucia Europe/Tiraspol Europe/Copenhagen]
CST appears in [Cuba ROC US/Michigan]
FMT appears in [Africa/Freetown Atlantic/Madeira]
HDT appears in [US/Aleutian US/Hawaii]
HMT appears in [Cuba Asia/Kolkata Atlantic/Azores Europe/Mariehamn]
IMT appears in [Asia/Irkutsk Turkey]
IST appears in [Europe/Dublin Asia/Kolkata Israel]
JMT appears in [Israel Atlantic/St_Helena]
KMT appears in [Jamaica America/St_Vincent Europe/Zaporozhye Europe/Vilnius]
MMT appears in [America/Montevideo Asia/Colombo W-SU Indian/Maldives Africa/Monrovia America/Managua Asia/Kolkata Asia/Ujung_Pandang Europe/Minsk]
MST appears in [W-SU US/Mountain]
NPT appears in [US/Aleutian Canada/Newfoundland]
NST appears in [US/Aleutian Canada/Newfoundland Europe/Amsterdam]
NWT appears in [US/Aleutian Canada/Newfoundland]
PDT appears in [US/Pacific Asia/Manila]
PMT appears in [Asia/Pontianak Asia/Yekaterinburg Europe/Prague Europe/Paris America/Paramaribo]
PST appears in [US/Pacific Asia/Manila]
RMT appears in [Asia/Yangon Europe/Riga Europe/Vatican]
SAST appears in [Africa/Windhoek Africa/Johannesburg]
SMT appears in [Chile/Continental Singapore Atlantic/Stanley Europe/Simferopol]
TMT appears in [Iran Europe/Tallinn]
@ianlancetaylor As I said in the proposal text, the offset would only be created for unique names, although there are more duplicates than I suspected.
@fzipp That does seem like a clean proposal, but in practice I suspect the pattern would be to call LoadLocation and if that fails, LoadFixedZone. Putting the functionality into LoadLocation seems worthwhile.
@robpike Yeah, I understood the intended handling of duplicate names, but it troubles me that PST
and PDT
, which I would expect to be among the more common timezone identifiers that people use, are ambiguous.
I'm not sure what the use case is for this kind of string, but it seems like the caller is the only one in a position to resolve the ambiguity. To that end, maybe there should be another flavor of time.Parse
that takes something like func(string) (*time.Location, error)
, where the parameter to the callback is the timezone from the string being parsed (e.g., "EST").
(Of course, in the simplest case, they could just parse the timezone themselves beforehand and use time.ParseInLocation
.)
It's also possible to disambiguate the names using the installed location, as hinted by @rittneje but more automatically.
First, though, there needs to be some acknowledgement that there is a problem that needs addressing. To me, that's clear, but maybe not to others, or maybe not of sufficient importance. But that typescript above is not a happy thing.
I, at least, agree that there is a problem here. Possibly related: #19694 #34913 #56528
It's also possible to disambiguate the names using the installed location
@robpike can you elaborate on what you mean by this?
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
@robpike
I would suggest that Go handles the TZ environment variable like libc/tzcode does, so that it's consistent with the rest of the system. When TZ is set to Europe/Berlin
, date(1) prints the timezone abbreviation as CET
or CEST
, thus I would say that in the third example in your original comment Go is behaving correctly.
$ TZ=Europe/Berlin date
Mon Nov 20 21:40:18 CET 2023
What is currently lacking is parsing POSIX-style TZ strings, as described in the newtzset(3) man page (e.g., CET-1CEST,M3.5.0,M10.5.0/3
). I added a call to tzset
to initLocal
in src/time/zoneinfo_unix.go
(and made some other improvements related to handling of type rule
) and intend to submit patches.
However, time.tzset
parses the string strictly, whilst the parser in tzcode (used by glibc and others) parses as much of the string as it can and discards the rest, thus TZ=CEST
has the same effect as TZ=CEST0
(a static timezone named CEST at UTC+0):
$ TZ=UTC date ; TZ=Etc/GMT+0 date ; TZ=CEST date
Mon Nov 20 21:12:33 UTC 2023
Mon Nov 20 21:12:33 GMT 2023
Mon Nov 20 21:12:33 CEST 2023
I kinda understand the desire to handle timezone abbreviations in a best-effort manner in Parse
and ParseInLocation
, but in my opinion the pitfalls (like "CEST" being a timezone abbreviation but "CET" a location that is in CEST most of the time) are too big for LoadLocation
.
My suggestion would be for LoadLocation
to do either of:
There's a comment by @rsc in src/time/zoneinfo.go
, before LoadLocation
:
// NOTE(rsc): Eventually we will need to accept the POSIX TZ environment
// syntax too, but I don't feel like implementing it today.
There are a couple of fine points about handling of TZ that I'm not sure about, like whether POSIX-style TZ strings should be handled on OSes other than Unix at all. Should I submit another proposal or keep the discussion here?
@fzipp
Parse
attempts to use the newLoadFixedZone
when encountering a zone string, as it's the most specific option and corresponds to what is produced byTime.Format
. If that fails, it falls back to usingLoadLocation
.
A timezone abbreviation may also be useful in ParseInLocation
to disambiguate a timestamp around an autumn transition, e.g., 2023-10-29 02:30 +0200 CEST and 02:30 +0100 CET.
However, if there's no time corresponding to such string exactly, this may be tricky. E.g., Europe/Dublin
has three time types named IST
, two of which differ only by the value of the DST flag (at UTC+1) and one that was used from 1916-05-21 to 1916-10-01 (at GMT+00:34:39).
I tried implementing this proposal, scanning the database to get all the timezone initialisms, and there are far more duplicates than I imagined. Therefore although I'd love to find a solution to the problem, which I think is a problem, my naive suggestion is certainly not it.
Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
Consider this unedited typescript running on my Mac with macOS and a recent Go distribution:
There are a few mysteries here for the uninitiated.
First, the time zone in the result - which is printed by
time.Time.String
so is actually what is held in the time value - always reads CEST. That seems good for starters.Yet second, only the last of the three runs prints a correct time zone offset for CEST. The first two have it at zero, which is always wrong for CEST.
Third, even when I explicitly set the time zone through the TZ variable, it still doesn't get the offset right.
Fourth, even in the last run, the time zone prints as CEST although one might expect Europe/Berlin.
Fifth, perhaps strangest of all, this is working correctly according to the documentation in the time package.
This is a proposal to address, if not completely clear up, these mysteries.
Allow me to explain. (If you don't know much about time zones and how they work in modern computing, I suggest reading this IETF document first.)
The concept of time zone is not a first-class citizen in the time package. Instead, "location" is paramount. To get the correct rendering of a time, we must in effect ask for the time in Berlin, not in Central European Standard Time. In library terms, we call
LoadLocation
, notLoadTimeZone
. There are a number of reasons for this, and it is confusing, but the fundamental one is that is how the IETF time zone (ha!) database works. That is why the only run above that gets everything right is the one where we set the location, even though that is ironically, incorrectly and confusingly done through the legacy TZ environment variable, whose initials of course stand for "time zone".What happens in that third run is that because our location is Berlin, we have loaded the data for Berlin, and in that data is a description of the time zone called CEST. There is no time zone file for CEST itself. The only description of CEST is inside the data for Berlin. (Have a look for yourself; the data is in
/usr/share/zoneinfo/europe/Berlin
on Unix, although it's in binary; you need a program called zdump to examine it.)In the first run, my location is not in central Europe, the data for CEST is never loaded, and
time.Parse
humors me but delivers an unsatisfactory result. I'll explain why in a minute. On the other hand, if you happen to live in central Europe, that first run will in fact work properly for you.In the second run, a reasonable attempt to address this issue through the time-zone-looking TZ variable still fails, because CEST is not a location and therefore cannot be
LoadLocationed
.You might ask, "Why doesn't
time.Parse
give an error then, if CEST is not a valid location?" It's because of this paragraph, the last in the doc comment fortime.Parse
:I claim that this is an unfortunate design decision, albeit one made in good faith with a tinge of ignorance back in the early days of Go's development.
(You might also ask, "Why does
time.Parse
accept time zone strings and not locations?" There the answer is simpler: That's what time values look like, confusing though it is in this context.)Compatibility concerns might argue we can't fix this behavior, but I believe it's confusing and results in apparently correct but actually wrong results, and should be fixed. But we can't just switch the whole thing to time zones for the very reason that locations were chosen as the fundamental unit. But we can do something about time zones.
Here is the proposal.
If you look in the time zone database manually, you'll find that all the time zone abbreviations like EST and AEST and CEST are unique, or at least nearly all are. In the data we can see the offset, which is a fixed value. For instance, EST is always offset by 5 hours from UTC, while EDT is always offset by 4 hours. I propose:
Parse
orLoadLocation
is given a location/time zone that is not loadable, instead of incorrectly using a zero offset, we look up the name in that table. If it exists, we build a nonceLocation
with the provided name and fixed offset.The properties of these nonce locations are worth enumerating:
I believe this approach is easy, reasonable, less confusing, and worth doing.
(Developed in conversation with @rsc).