golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.47k stars 17.59k forks source link

time: timezone name heuristic in Parse rejects MSK and MSD #3790

Closed gopherbot closed 9 years ago

gopherbot commented 12 years ago

by k.shutemov:

I found that that current heuristic for parsing stdTZ is too strict:

go/src/pkg/time/format.go
 809                         if len(value) >= 3 && value[2] == 'T' {
 810                                 p, value = value[0:3], value[3:]
 811                         } else if len(value) >= 4 && value[3] == 'T' {
 812                                 p, value = value[0:4], value[4:]
 813                         } else {
 814                                 err = errBad
 815                                 break
 816                         }

As you can see it only accepts timezone names with 'T' at end. It's not
correct: "MSK" is a valid timezone name.
patrickmn commented 12 years ago

Comment 1:

For reference, MSK is now (since March 2011) MSD, which is UTC+4.
rsc commented 12 years ago

Comment 2:

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

rsc commented 12 years ago

Comment 3:

Labels changed: added go1.1maybe.

rsc commented 11 years ago

Comment 4:

An experiment to run. Load every location in the tz directory. Convert
Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec 1 00:00:00 2012 UTC to local time and
record time zone abbreviation.
What's that list? Are MSK and MSD the only exceptions to the [A-Z]{2,3}T pattern?
gopherbot commented 11 years ago

Comment 5 by k.shutemov:

Here's results:
AZOST
CHADT
CHAST
ChST
EASST
GMT-1
GMT+1
GMT-10
GMT+10
GMT-11
GMT+11
GMT-12
GMT+12
GMT-13
GMT-14
GMT-2
GMT+2
GMT-3
GMT+3
GMT-4
GMT+4
GMT-5
GMT+5
GMT-6
GMT+6
GMT-7
GMT+7
GMT-8
GMT+8
GMT-9
GMT+9
MeST
MSK
UTC
WARST
zzz
The script I've used and raw results: https://gist.github.com/4250471
gopherbot commented 11 years ago

Comment 6 by k.shutemov:

The same, but I've included historical timezones for 1850 -- 2012 years
AKTST
ALMST
ANAST
AQTST
ASHST
AZOMT
AZOST
BAKST
BEAUT
BORTST
CHADT
CHAST
CHOST
ChST
CKHST
DUSST
EASST
FRUST
GMT-1
GMT+1
GMT-10
GMT+10
GMT-11
GMT+11
GMT-12
GMT+12
GMT-13
GMT-14
GMT-2
GMT+2
GMT-3
GMT+3
GMT-4
GMT+4
GMT-5
GMT+5
GMT-6
GMT+6
GMT-7
GMT+7
GMT-8
GMT+8
GMT-9
GMT+9
HOVST
IRKST
KIZST
KRAST
KUYST
MADMT
MADST
MAGST
MALST
MeST
MSD
MSK
NOVST
OMSST
ORAST
PETST
QYZST
SAKST
SAMST
SHEST
SVEST
TASST
TBIST
ULAST
URAST
UTC
UYHST
VLASST
VLAST
VOLST
WARST
YAKST
YEKST
YERST
zzz
Updated script and raw results: https://gist.github.com/4251094
rsc commented 11 years ago

Comment 7:

Thanks very much. Looks like the allowe patterns should be:
MSK
MSD
UTC
GMT
GMT[+-][0-9]+ w/ nonzero number between -14 and +12
[A-Z][A-Za-z]?[A-Z]?[DMSU]T
(I'm not suggesting to use package regexp, just using that notation to
express what should be allowed.)
rsc commented 11 years ago

Comment 8:

Windows uses UTC[+-][0-9]+ sometimes. Perhaps we should allow those in addition to
GMT[+-][0-9]+.
See notes on issue #4838.
rsc commented 11 years ago

Comment 9:

[The time for maybe has passed.]

Labels changed: removed go1.1maybe.

minux commented 11 years ago

Comment 10:

Issue #5672 has been merged into this issue.

matrixik commented 11 years ago

Comment 11:

I found one more:
MST
--- FAIL: TestLocalZoneAbbr (0.00 seconds)
        zoneinfo_windows_test.go:18: Parse failed: parsing time "Sat, 22 Jun 2013 13:42:36 +0200" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "+0200" as "MST"
Windows 7 x32
go version devel +56909cb770fe Fri Jun 21 18:07:57 2013 -0700 windows/386
Best regards,
Dobrosław Żybort
alexbrainman commented 11 years ago

Comment 12:

matrixik,
Do you mind running this program http://play.golang.org/p/vCsioBDpMk on your computer
and reporting results back here? I suspect your syscall.Timezoneinformation is different
from what Go supports at this moment. And I am wonering why and what to do about it.
Thank you.
Alex
matrixik commented 11 years ago

Comment 13:

Alex,
Results:
```
i=syscall.Timezoneinformation{Bias:-60, StandardName:[32]uint16{0x15a, 0x72, 0x6f, 0x64,
0x6b, 0x6f, 0x77, 0x6f, 0x65, 0x75, 0x72, 0x6f, 0x70, 0x65, 0x6a, 0x73, 0x6b, 0x69,
0x20, 0x63, 0x7a, 0x61, 0x73, 0x20, 0x73, 0x74, 0x61, 0x6e, 0x64, 0x2e, 0x0, 0x0},
StandardDate:syscall.Systemtime{Year:0x0, Month:0xa, DayOfWeek:0x0, Day:0x5, Hour:0x3,
Minute:0x0, Second:0x0, Milliseconds:0x0}, StandardBias:0,
DaylightName:[32]uint16{0x15a, 0x72, 0x6f, 0x64, 0x6b, 0x6f, 0x77, 0x6f, 0x65, 0x75,
0x72, 0x6f, 0x70, 0x65, 0x6a, 0x73, 0x6b, 0x69, 0x20, 0x63, 0x7a, 0x61, 0x73, 0x20,
0x6c, 0x65, 0x74, 0x6e, 0x69, 0x0, 0x0, 0x0}, DaylightDate:syscall.Systemtime{Year:0x0,
Month:0x3, DayOfWeek:0x0, Day:0x5, Hour:0x2, Minute:0x0, Second:0x0, Milliseconds:0x0},
DaylightBias:-60}
i.StandardName="Środkowoeuropejski czas stand."
i.DaylightName="Środkowoeuropejski czas letni"
```
Where:
Środkowoeuropejski czas stand. = Central European Standard Time.
Środkowoeuropejski czas letni = Central European Summer Time
Best regards,
Dobrosław Żybort
alexbrainman commented 11 years ago

Comment 14:

matrixik,
Thank you for the report. I think your issue is different, so I am starting new issue
https://golang.org/issue/5783. Lets continue our conversation there.
Alex
rsc commented 11 years ago

Comment 15:

I don't fully understand what remains here but let's finish it for Go 1.2.

Labels changed: added go1.2.

robpike commented 11 years ago

Comment 16:

Owner changed to @robpike.

snaury commented 11 years ago

Comment 17:

Is there a CL for this issue? What about something like this?
diff -r da11b2a1cbc1 src/pkg/time/format.go
--- a/src/pkg/time/format.go    Tue Aug 06 12:04:08 2013 -0700
+++ b/src/pkg/time/format.go    Wed Aug 07 01:18:31 2013 +0400
@@ -939,8 +939,17 @@
            } else if len(value) >= 4 && value[3] == 'T' {
                p, value = value[0:4], value[4:]
            } else {
-               err = errBad
-               break
+               var i int
+               for i = 0; i < len(value); i++ {
+                   if value[i] < 'A' || 'Z' < value[i] {
+                       break
+                   }
+               }
+               if i != 3 && i != 4 {
+                   err = errBad
+                   break
+               }
+               p, value = value[0:i], value[i:]
            }
            for i := 0; i < len(p); i++ {
                if p[i] < 'A' || 'Z' < p[i] {
Making any 3 or 4 uppercase letters a timezone.
snaury commented 11 years ago

Comment 18:

On the other hand it might not be enough. This shows:
http://www.timeanddate.com/library/abbreviations/timezones/
That timezones may be as short as 1 character, and there's one timezone with a lowercase
letter in it: ChST.
robpike commented 11 years ago

Comment 19:

Russ's pattern is
MSK
MSD
UTC
GMT
GMT[+-][0-9]+ w/ nonzero number between -14 and +12
[A-Z][A-Za-z]?[A-Z]?[DMSU]T
but the only lower-case letter apperas to be 'h', so I suggest
[A-Z][A-Zh]?[A-Z]?[DMSU]T
robpike commented 11 years ago

Comment 20:

The full list at http://www.timeanddate.com/library/abbreviations/timezones/ shows that
the [DMSU] check is wrong. there are many time zones that don't have that property, PWT
for one.  Thank you QuickCheck.
robpike commented 11 years ago

Comment 21:

I have an idea. Details to follow.
nigeltao commented 11 years ago

Comment 22:

If you need to scrape that page, this program:
--------
package main
import (
    "fmt"
    "log"
    "net/http"
    "code.google.com/p/go.net/html"
)
func walk(root *html.Node, f func(*html.Node)) {
    f(root)
    for n := root.FirstChild; n != nil; n = n.NextSibling {
        walk(n, f)
    }
}
func main() {
    res, err := http.Get("http://www.timeanddate.com/library/abbreviations/timezones/")
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    doc, err := html.Parse(res.Body)
    if err != nil {
        log.Fatal(err)
    }
    walk(doc, func(n *html.Node) {
        if n.Type != html.ElementNode || n.Data != "tr" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.ElementNode || n.Data != "td" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.ElementNode || n.Data != "a" {
            return
        }
        n = n.FirstChild
        if n == nil || n.Type != html.TextNode {
            return
        }
        fmt.Println(n.Data)
    })
}
--------
prints:
--------
A
ACDT
ACST
ADT
ADT
AEDT
AEST
AFT
...
YAPT
YEKST
YEKT
Z
--------
Note that some rows (e.g. ADT) are repeated.
robpike commented 11 years ago

Comment 23:

This issue was updated by revision 727b2b6f7dbec2bed608e3c97129c3bb7bab054.

Handle time zones like GMT-8.
The more general time zone-matching problem is not yet resolved.
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/12922043
robpike commented 11 years ago

Comment 24:

Much to my disappointment, that URL doesn't have all the three-letter abbreviations even
listed on my own machine. I think it's hopeless to have a definitive list, especially as
it's sure to change as local governments decide to feel important by inventing their own
time zones (I'm looking at you, Eucla). So it's sad but the right answer is probably
just some simple sanity checks. CL coming.
robpike commented 11 years ago

Comment 25:

This issue was closed by revision a454d2fd2e6a60648728ca6c959a7c0b24119fe.

Status changed to Fixed.

gopherbot commented 11 years ago

Comment 26 by diogomfranco:

That commit doesn't handle the *many* 5-letter abbreviations in the lists above --
they're almost all rejected since they don't have a T as the 4th character, but as the
last. The lists above also happen to not include my local timezone, ESAST (according to
windows).
gopherbot commented 11 years ago

Comment 27 by diogomfranco:

(or treated as 3-letter timezones, which I assume breaks parsing later on due to the
leftover 2-letter garbage (why can't googlecode let me edit my comment))
alexbrainman commented 11 years ago

Comment 28:

diogomfranco, please run this program http://play.golang.org/p/vCsioBDpMk on your
computer and post results here, so we know what your time zone is. Thank you.
Alex
robpike commented 11 years ago

Comment 29:

This issue was closed by revision 4c855f3830f88c0d11dbc19ff0a34cfa1beecc6.

gopherbot commented 11 years ago

Comment 30 by diogomfranco:

re #28:
i=syscall.Timezoneinformation{Bias:180, StandardName:[32]uint16{0x45, 0x2e, 0x20, 0x53,
0x6f, 0x75, 0x74, 0x68, 0x20, 0x41, 0x6d, 0x65, 0x72, 0x69, 0x63, 0x6}
i.StandardName="E. South America Standard Time"
i.DaylightName="E. South America Daylight Time"
alexbrainman commented 11 years ago

Comment 31:

diogomfranco, I just changed my timezone to "(GMT-03:00) Brasilia" and run time test:
C:\go\root\src>go test time
ok      time    11.141s
C:\go\root\src>hg id
9fb09f7edb82 tip
The test passes. So, I suspect, you shouldn't have any problems too.
Alex