golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.5k stars 17.6k forks source link

net/http: TestClientTimeout* failures with "missing timeout substring" on Windows #43120

Open bcmills opened 3 years ago

bcmills commented 3 years ago
#!watchflakes
post <- goos == "windows" && pkg == "net/http" && test ~ `^TestClientTimeout.*` && `missing timeout substring`

2020-12-09T01:34:53-ae9b442/windows-arm-zx2c4 2020-12-07T17:19:41-7f9a2bc/windows-arm-zx2c4 2020-12-03T02:35:36-da54dfb/windows-arm-zx2c4 2020-12-02T20:17:57-2d0258d/windows-arm-zx2c4

--- FAIL: TestClientTimeout_h1 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_h2 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
FAIL
FAIL    net/http    43.328s

CC @zx2c4 @bradfitz @neild @empijei

zx2c4 commented 3 years ago

We increased the timeout here: https://github.com/golang/build/commit/b6220de1adc000cad31eadf59558e1d3a8e32205

But I don't know if that change has been deployed yet. @dmitshur - do you know?

zx2c4 commented 3 years ago

Oh, the latest log shows "GO_TEST_TIMEOUT_SCALE=3", so looks like it has been deployed.

So I'm not quite sure...

dmitshur commented 3 years ago

Yes, it has. You can find out the deployed version by visiting https://farmer.golang.org, it tells the version at the top:

[...] Uptime 10h21m44s. Version 90d5e7c418c099f6d88cfc0e6269a94fa4da7f0e.
bcmills commented 3 years ago

Looks like it has started occurring on other builders too.

greplogs --dashboard -md -l -E 'missing timeout substring' --since=2020-12-10

2021-08-30T22:07:43-437362c/openbsd-amd64-68 2021-06-30T16:44:19-0fa3265/windows-arm64-10 2021-06-25T21:04:07-5160896/windows-arm64-aws 2021-06-18T16:56:48-57aaa19/windows-arm-zx2c4 2021-06-16T20:38:07-0e67ce3/windows-arm64-aws 2021-06-15T18:42:11-219fe9d/windows-arm64-aws 2021-06-11T16:09:15-e2dc6dd/windows-arm-zx2c4 2021-05-10T15:47:52-a9edda3/openbsd-amd64-68 2021-05-04T23:35:34-137be77/windows-arm-zx2c4 2021-04-30T19:59:10-faff49a/windows-arm-zx2c4 2021-04-13T21:13:12-c19759a/windows-arm-zx2c4 2021-04-12T18:08:47-849dba0/openbsd-amd64-68 2021-04-09T14:33:44-a690a5d/windows-arm-zx2c4 2021-04-07T13:24:10-4520da4/windows-arm-zx2c4 2021-04-07T02:05:55-8d77e45/windows-arm-zx2c4 2021-04-06T22:55:58-b084073/windows-arm-zx2c4 2021-04-01T00:51:23-3304b22/windows-arm-zx2c4 2021-03-30T16:13:36-89b141c/windows-arm-zx2c4 2021-03-29T08:11:13-565e70f/windows-arm-zx2c4 2021-03-29T06:21:31-d10241f/windows-arm-zx2c4 2021-03-22T20:20:58-d9691ff/windows-arm-zx2c4 2021-03-10T19:33:23-ccf9ace/windows-arm-zx2c4 2021-03-08T20:03:01-bd37284/windows-arm-zx2c4 2021-03-05T23:32:34-7205a4f/windows-arm-zx2c4 2021-03-04T23:37:01-96a96a9/windows-arm-zx2c4 2021-03-04T21:47:26-9d3718e/windows-arm-zx2c4 2021-02-24T18:49:08-b7f62da/windows-arm-zx2c4 2021-02-24T12:37:13-26001d1/windows-arm-zx2c4 2021-02-23T18:39:53-c584f42/windows-arm-zx2c4 2021-02-05T22:35:11-b54cd94/windows-arm-zx2c4 2021-01-06T15:02:50-d213170/windows-arm-zx2c4 2020-12-30T22:39:48-0ae2e03/windows-arm-zx2c4 2020-12-16T09:01:04-75e16f5/windows-arm-zx2c4

CC @neild

bcmills commented 2 years ago

Curiously, this has still only ever occurred on windows-arm variants and openbsd-amd64-68. I don't know whether we can really discount the two OpenBSD failures, but the apparent specificity to windows-arm is puzzling. (Maybe this is a memory-model synchronization issue, like #48072?)

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-10-01

2021-11-03T18:37:22-cfd016d/windows-arm64-10 2021-10-22T21:26:30-5d414d1/windows-arm64-10 2021-10-14T18:43:59-765c911/windows-arm64-10 2021-10-14T04:18:44-9e4dc6f/windows-arm64-10 2021-10-05T15:47:57-017ffcd/windows-arm64-10

gopherbot commented 2 years ago

Change https://golang.org/cl/361275 mentions this issue: net/http: reduce TestClientTimeout_h{1,2} latency

bcmills commented 2 years ago

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-11-05

2021-12-29T19:23:11-8a306e2/windows-arm64-10 2021-11-12T23:34:09-fdee1b2/windows-arm64-10

bcmills commented 2 years ago

windows/arm64 is not currently listed as a first class port, so — given the consistency of the failing platform — I think a skip may be in order to reduce windows/arm64 builder flakiness.

I honestly don't know what to make of the couple of openbsd-amd64-68 failures, though.

(CC @bufflig)

gopherbot commented 2 years ago

Change https://golang.org/cl/375635 mentions this issue: net/http: skip TestClientTimeout_h{1,2} on windows/arm and windows/arm64

bcmills commented 2 years ago

Marking as release-blocker until the skip is added, at which point this can be moved back to the backlog.

bcmills commented 2 years ago

The skip is merged. Back to the Backlog. 😩

bcmills commented 2 years ago

Apparently TestClientTimeout_Headers_h[12] are also affected by the same underlying bug:

--- FAIL: TestClientTimeout_Headers_h2 (0.01s)
    client_test.go:1342: error string = "Get \"https://127.0.0.1:53626\": context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_Headers_h1 (0.01s)
    client_test.go:1342: error string = "Get \"http://127.0.0.1:53627\": context deadline exceeded"; missing timeout substring
2022/01/17 01:29:00 http: TLS handshake error from 127.0.0.1:53639: read tcp 127.0.0.1:53638->127.0.0.1:53639: wsarecv: An existing connection was forcibly closed by the remote host.
FAIL
FAIL    net/http    6.571s

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2022-01-06

2022-01-17T09:23:25-897b3da/windows-arm64-10

gopherbot commented 2 years ago

Change https://golang.org/cl/379156 mentions this issue: net/http: skip TestClientTimeout_Headers_h{1,2} on windows/arm and windows/arm64

bcmills commented 2 years ago

Looks like the skip isn't quite wide enough — this can also manifest during the initial Get call if the deadline happens to fire before the headers are processed.

2022-08-08T16:40:58-e730703/windows-arm64-10:

--- FAIL: TestClientTimeout_h2 (0.01s)
    client_test.go:1260: attempting test with timeout 10ms
    client_test.go:1275: Get "https://127.0.0.1:51129/?nonce=0": context deadline exceeded
FAIL
FAIL    net/http    6.996s
gopherbot commented 2 years ago

Change https://go.dev/cl/425096 mentions this issue: net/http: skip Get flakes in TestClientTimeout tests on windows/arm4

bcmills commented 1 year ago

This test is still skipped at HEAD, but there have been a fair number of net/http fixes since it was added.

@neild, do you think it's worth unskipping at this point?

bcmills commented 1 year ago

Apparently not fixed: it reproduced on windows/amd64 in https://github.com/golang/go/issues/63365#issuecomment-1746068877.

(CC @golang/windows)

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && pkg == "net/http" && test ~ `^TestClientTimeout.*` && `missing timeout substring`
2023-10-04 02:55 windows-amd64-2016 go@9018601e net/http.TestClientTimeout_Headers (log) --- FAIL: TestClientTimeout_Headers (0.00s) --- FAIL: TestClientTimeout_Headers/h1 (0.01s) client_test.go:1313: error string = "Get \"http://127.0.0.1:51755\": context deadline exceeded"; missing timeout substring

watchflakes

gopherbot commented 2 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && pkg == "net/http" && test ~ `^TestClientTimeout.*` && `missing timeout substring`
2024-08-07 17:23 gotip-windows-amd64 go@b696250e net/http.TestClientTimeout/h2 (log) === RUN TestClientTimeout/h2 === PAUSE TestClientTimeout/h2 === CONT TestClientTimeout/h2 client_test.go:1212: attempting test with timeout 10ms client_test.go:1259: error string = "context deadline exceeded"; missing timeout substring --- FAIL: TestClientTimeout/h2 (0.01s)

watchflakes