golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124k stars 17.67k forks source link

time: Parse behaves inconsistently when parsing numerical timezones with an "MST" format string #30780

Open nickmooney opened 5 years ago

nickmooney commented 5 years ago

What version of Go are you using (go version)?

Local machine:

$ go version
go version go1.12 darwin/amd64

Docker container:

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/nmooney/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/nmooney/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.12/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.12/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/vv/t3z4pbwn3pd36hm89fg1m8jc0000gn/T/go-build118701068=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Go Playground example here.

I attempted to use the time.Parse format string "Mon, 2 Jan 2006 15:04:05 MST" to parse the date "Tue, 12 Mar 2019 15:34:39 -0000".

What did you expect to see?

I would expect time.Parse to fail to parse the string, since it's suffixed with -0000 rather than an alphabetical time zone designator.

This behaves correctly when I try to parse a date with a non-zero offset, such as "Tue, 12 Mar 2019 15:34:39 -0700" (i.e. parsing fails).

What did you see instead?

time.Parse successfully parses date strings with TZ offsets of zero given a format string that ends with MST, when it should likely fail if there is no alphabetical timezone designator.

titanous commented 5 years ago

This looks like a bug to me, Parse shouldn't behave inconsistently with inputs that only differ based on the offset, I think it should definitely error in all cases where the MST token matches an incompatible input value. Note that leaving off -0700 should work, as MST will parse as zero based on the documentation:

Elements omitted from the value are assumed to be zero or, when zero is impossible, one, so parsing "3:04pm" returns the time corresponding to Jan 1, year 0, 15:04:00 UTC (note that because the year is 0, this time is before the zero Time). Years must be in the range 0000..9999. The day of the week is checked for syntax but it is otherwise ignored.

agnivade commented 5 years ago

/cc @ianlancetaylor

gopherbot commented 5 years ago

Change https://golang.org/cl/167381 mentions this issue: time: fail when Parse is given a stdNumTZ as stdTZ

ALTree commented 5 years ago

I would expect time.Parse to fail to parse the string, since it's suffixed with -0000 rather than an alphabetical time zone designator.

This expectation is not correct. Not every timezone has a three-letter name, so it was decided to allow numeric timezone names (like +07) to match MST. This was implemented in commit 9f2c611fa7cccfacd317b88eff14d3ba64e855ff. This allows Parse to keep working when the format specifier ask for a timezone names (using MST), but the function is given a correct time string for a timezone with no numeric name.

nickmooney commented 5 years ago

This expectation is not correct. Not every timezone has a three-letter name, so it was decided to allow numeric timezone names (like +07) to match MST.

Yes, timezones of +07 or GMT+3 should be parsed correctly as a token of type stdTZ, but -0000 is a stdNumTZ and shouldn't be parsed as a standin for the MST token.

ALTree commented 5 years ago

Yes, timezones of +07 or GMT+3 should be parsed as a token of type stdTZ, but -0000 is a stdNumTZ

But how do you differentiate between a numeric timezone name and an stdNumTZ? What's the rule?

nickmooney commented 5 years ago

In the layout string, a value of "MST" encodes to a stdTZ (which accepts values like "MST", "PDT", "GMT+3", "+07".

A value of "-0700" in the layout string encodes to a stdNumTZ, which should be in a form that looks like "+0000", "-0700", etc.

This behavior is already part of time.Parse. This issue and the associated pull request just fixes the case where -0000 is parsed as a stdTZ (since 0 is between -23 and 23), but -0700 is not (since 700 is greater than 23). In fact, neither of those values are acceptable values for a stdTZ. This issue and pull request do not change the behavior of string parsing where the layout string contains -0700 (signaling that a stdNumTZ is expected).

ALTree commented 5 years ago

but -0700 is not (since 700 is greater than 23)

My point is that this is wrong. It's also the current behaviour, but now your CL is aligning Parse, for 0000, to a behaviour that is probably not correct. In fact, I argue that both -0000 and -0700 are valid numeric timezones names. Currently, Parse rejects the second, and after your patch it'll also reject the first. But the correct behaviour is to accept both, since the criteria "let's parse NNNN as an integer and compare that with 23" is not correct.

nickmooney commented 5 years ago

They are valid numeric timezone names, but not valid non-numeric timezone names. If you use -0700 in your layout string for Parse, both -0000 and -0700 are accepted as you would expect. Parse recognizes a difference between stdTZ and stdNumTZ in the layout string ("MST" vs "-0700").

Are you suggesting that they should be interchangeable? Because that also solves this bug, but seems to me to be non-obvious behavior. My initial reaction is that I would be surprised if I was able to hand a non-numeric timezone designator to a function that uses a layout string specifying a 4-digit offset. Otherwise, why recognize a difference between these two types in the layout string?

ALTree commented 5 years ago

Here's how I understand it:

MST should cause Parse to eat anything that could potentially be an abbreviated timezone name. This includes the following: EST, EEST, +07, -05, +00, +0000, +0500, +0330.

-0700 should cause Parse to exclusively accept numeric timezones.

This is how I read the documentation, where it says:

stdTZ                                // "MST"
...
stdNumTZ                             // "-0700"  // always numeric

so no, I don't think they are interchangeable, since with -0700 Parse would reject e.g. EEST.

nickmooney commented 5 years ago

Ah, I understand where you're coming from. That would be a fine solution as well, as long as behavior is consistent when parsing strings that are suffixed with both -0000 and -0700.

rsc commented 5 years ago

/cc @robpike

// parseSignedOffset parses a signed timezone offset (e.g. "+03" or "-04"). // The function checks for a signed number in the range -23 through +23 excluding zero. // Returns length of the found offset string or 0 otherwise

This function is expecting "+03" and accepting "-0000" (unexpectedly!) but rejecting "-0700" (because 700 is well out of range). It's unclear what the right answer is - should parseSignedOffset allow 4-digit offsets in order to deal with fractional-hour time zones?

ALTree commented 5 years ago

should parseSignedOffset allow 4-digit offsets in order to deal with fractional-hour time zones?

This will be necessary anyway to fix #26032. For example, if you zdump Asia/Kabul you get:

Sun Aug 5 16:25:10 2018 +0430

which time.Parse mistakenly rejects because it doesn't like the +0430 part.

nickmooney commented 5 years ago

What are the next steps required for this?

ianlancetaylor commented 5 years ago

@nickmooney We need to decide on the right thing to do.

nickmooney commented 5 years ago

Gotcha, @ianlancetaylor. I haven't contributed to Go before, and I'm not super familiar with who is in charge of making those decisions. It seems like we have two options -- is there a group of people who are the "right people" to make that decision?

ianlancetaylor commented 5 years ago

Yes, there is a group of people who periodically review NeedsDecision issues.

In this case it would be interesting to hear whether @robpike has an opinion.

robpike commented 5 years ago

Perhaps parseSignedOffset should allow 4-digit offsets. I'm not sure.