golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.01k stars 17.68k forks source link

net: TestDialParallelSpuriousConnection occasionally times out #34495

Closed bcmills closed 2 years ago

bcmills commented 5 years ago

From the darwin-amd64-race builder (https://build.golang.org/log/96aa6d382799b872cbedeb56e3b9d2076acc6546):

``` panic: test timed out after 3m0s goroutine 15 [running]: testing.(*M).startAlarm.func1() /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1377 +0x11c created by time.goFunc /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/time/sleep.go:168 +0x52 goroutine 1 [chan receive]: testing.(*T).Run(0xc000102000, 0x13ceb09, 0x22, 0x13d56f0, 0x1) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:961 +0x68a testing.runTests.func1(0xc000102000) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1202 +0xa7 testing.tRunner(0xc000102000, 0xc000095d08) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:909 +0x19a testing.runTests(0xc0000d6120, 0x1609be0, 0xe4, 0xe4, 0x0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1200 +0x522 testing.(*M).Run(0xc0000e6000, 0x0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1117 +0x300 net.TestMain(0xc0000e6000) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/main_test.go:52 +0x47 main.main() _testmain.go:564 +0x224 goroutine 29 [semacquire]: sync.runtime_Semacquire(0xc0000b1908) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/sema.go:56 +0x42 sync.(*WaitGroup).Wait(0xc0000b1900) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/sync/waitgroup.go:130 +0xb1 net.TestDialParallelSpuriousConnection(0xc000102400) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/dial_test.go:502 +0x605 testing.tRunner(0xc000102400, 0x13d56f0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:909 +0x19a created by testing.(*T).Run /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:960 +0x652 goroutine 31 [IO wait]: internal/poll.runtime_pollWait(0x2495f08, 0x72, 0x0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/netpoll.go:184 +0x55 internal/poll.(*pollDesc).wait(0xc0000e6598, 0x72, 0x0, 0x0, 0x13c32f2) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/internal/poll/fd_poll_runtime.go:87 +0xe4 internal/poll.(*pollDesc).waitRead(...) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Accept(0xc0000e6580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/internal/poll/fd_unix.go:384 +0x2cc net.(*netFD).accept(0xc0000e6580, 0xc0000b1830, 0xf, 0xf) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/fd_unix.go:238 +0x56 net.(*TCPListener).accept(0xc0000d65e0, 0x5, 0xf, 0x5) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/tcpsock_posix.go:139 +0x50 net.(*TCPListener).Accept(0xc0000d65e0, 0xbf5a84fb15435818, 0x12b4d9c54, 0x160d240, 0xc0001e37b0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/tcpsock.go:261 +0x50 net.TestDialParallelSpuriousConnection.func1(0xc000256000, 0x1432d80, 0xc0000d65e0) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/dial_test.go:439 +0x7c net.(*dualStackServer).buildup.func1(0xc0000d65a0, 0xc000256000, 0x1) /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/mockserver_test.go:148 +0xb3 created by net.(*dualStackServer).buildup /var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/net/mockserver_test.go:147 +0x86 FAIL net 180.026s ```

It's not obvious to me whether the test was deadlocked or just ran out of time.

If the latter, perhaps some of its tests should be skipped in short mode (and/or when the race detector is enabled)?

CC @mikioh @bradfitz @ianlancetaylor

bcmills commented 5 years ago

Compare #32534, #32919.

bcmills commented 5 years ago

Same failure mode on solaris-amd64-oraclerel: https://build.golang.org/log/05176237c5a06f48100a31a8b05997879c9f8ea1

bcmills commented 5 years ago

darwin-amd64-10_12: https://build.golang.org/log/5a6b1d2da89c39f0aa344ce5d9996aa44b8133a8

bcmills commented 5 years ago

darwin-amd64-10_12: https://build.golang.org/log/22a57819568de78654e834c54ee909a50826230b

bcmills commented 4 years ago

A different (but likely related) failure mode:

--- FAIL: TestDialParallelSpuriousConnection (1.49s)
    dial_test.go:456: got read tcp6 [::1]:59891->[::1]:59893: i/o timeout; want EOF
FAIL
FAIL    net 4.495s

2020-02-29T17:02:40-74f8983/darwin-arm64-corellium 2020-02-24T16:39:52-3093959/darwin-arm64-corellium 2020-01-31T20:18:54-1b7fefc/darwin-arm64-corellium 2020-01-23T21:01:12-ace25f8/darwin-arm-mg912baios 2020-01-19T14:04:09-8e0be05/darwin-arm-mg912baios

bcmills commented 4 years ago

More of the new failure mode: 2020-03-10T00:24:30-08dee51/darwin-arm64-corellium 2020-03-02T15:39:23-12d02e7/darwin-arm64-corellium

bcmills commented 4 years ago

Looks like the old failure mode really did stop in October.

2019-10-31T23:37:57-ce49f95/aix-ppc64 2019-10-30T14:06:04-17190de/darwin-amd64-10_12 2019-10-23T07:43:18-ab3f1a2/darwin-amd64-10_12 2019-10-11T16:34:01-ba108c9/darwin-amd64-race 2019-10-08T17:30:14-868de9a/darwin-amd64-10_12 2019-10-02T17:24:11-0000f0b/solaris-amd64-oraclerel 2019-09-24T04:19:31-e29d276/darwin-amd64-race 2019-09-18T20:32:43-bcf6f9f/darwin-amd64-10_12 2019-09-16T22:14:49-115e4c9/darwin-amd64-race 2019-09-06T21:22:26-8a8cf5b/darwin-amd64-race 2019-06-25T17:51:25-a6ad626/darwin-amd64-race 2019-05-13T17:01:25-db2bf15/darwin-amd64-race 2019-04-30T18:29:07-888bac1/darwin-amd64-race

bcmills commented 4 years ago

Actually, those new failures seem Corellium-specific. Opening a new issue.

bcmills commented 2 years ago

Looks like the hiatus from October 2019 to March 2020 was only temporary.

greplogs --dashboard -md -l -e '^panic: test timed out.*\n(?:.*\n)*net\..*\.Accept.*\n\t.*\nnet\.TestDialParallelSpuriousConnection' --since=2020-01-01

2021-12-21T22:26:48-60f2c12/freebsd-arm64-dmgk 2021-12-02T20:21:16-469f030/netbsd-amd64-9_0-n2d 2021-10-27T08:50:27-bdefb77/openbsd-arm64-jsing 2021-08-26T02:06:43-a6ff433/netbsd-arm-bsiegert 2021-08-20T18:45:25-0f25251/openbsd-arm64-jsing 2021-08-13T20:45:17-98f3d7f/solaris-amd64-oraclerel 2021-06-24T18:58:27-cce6214/openbsd-arm64-jsing 2021-06-18T16:56:48-57aaa19/openbsd-arm64-jsing 2021-05-21T18:55:18-e4d7525/illumos-amd64 2021-05-13T14:30:42-fd4631e/openbsd-arm64-jsing 2021-05-11T02:46:21-326a792/illumos-amd64 2021-05-01T11:42:49-a9db5a7/openbsd-arm64-jsing 2021-04-19T23:02:15-7252e1e/darwin-amd64-10_12 2021-04-14T00:25:33-8dcc071/freebsd-arm64-dmgk 2021-04-09T13:10:36-c432917/netbsd-386-9_0 2021-03-29T08:11:13-565e70f/openbsd-arm64-jsing 2021-01-15T20:37:59-ec94701/darwin-amd64-10_12 2020-12-07T21:01:46-7ad6596/openbsd-arm64-jsing 2020-11-10T09:01:41-1c7650a/darwin-amd64-10_12 2020-06-26T17:07:58-d1015f3/solaris-amd64-oraclerel 2020-04-30T14:41:10-ecdbffd/solaris-amd64-oraclerel

gopherbot commented 2 years ago

Change https://golang.org/cl/375694 mentions this issue: net: synchronize instead of sleeping in TestDialParallelspuriousconnection

bcmills commented 2 years ago

This is a recurring test failure and does not seem to be port-specific, so marking as release-blocker. I've mailed what I believe to be a fix.