Open bcmills opened 10 months ago
The "darwin-amd64-13" builder on the old dashboard doesn't have GO_TEST_TIMEOUT_SCALE
in its configuration, so this isn't a case of that not being ported over. Perhaps our macOS amd64 13 builder configuration is less powerful? If we know by how much, it makes sense to set a corresponding GO_TEST_TIMEOUT_SCALE
to compensate and avoid timeouts.
Another possibility is that we're running more tests on Mac builders in LUCI than we are on the old dashboard. For one, the old dashboard uses the macTestPolicy
test policy, which attempts to skips some expensive portable tests.
CC @prattmic, @mknyszek.
We discussed and looked into this a bit.
This turns out not to be a consistent slowness: looking at https://ci.chromium.org/ui/p/golang/builders/ci/gotip-darwin-amd64_13 briefly shows many builds complete successfully in 15-20 min, with some taking 10 or 30 min, but the failures are taking an hour or longer and reaching timeout.
This builder at this time has two different providers. The two recent timeouts happened on the same darwin-amd64-13--ac2a2248-1530-...
bot.
Update on this. I took a look at median, worst case, etc build times for the main repo builder by bot hostname and there definitely seems to be a lot of variance between different hosts: nearly 4x difference in median runtime between the fastest and slowest hosts.
My first thought was slow network on some hosts, so I eyeballed the difference between a run on the fastest and slowest hosts and found that the slowdown seems to impact pretty much all packages. e.g.,
cmd/go 139s -> 540s cmd/compile/internal/types2 15s -> 55s (pretty sure there is no network here)
Even small packages, like
cmd/gofmt 0.064s -> 0.79s
Some are similar though
crypto/md5 0.021s -> 0.024s
So it seems unrelated to network.
That slowest machine is https://chromium-swarm.appspot.com/bot?id=darwin-amd64-13--ac2a2248-1530-4adb-bb91-ae854aa7c79c.golang.ca.macservice.goog. Interestingly, it seems to be much faster today. Previously x/tools was timing out, now it is ~15m. x/build is faster, etc.
This definitely still needs more investigation.
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `gotip-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded`)
There seem to be timeouts on other LUCI darwin builders (other than 13). I put them all here for now. Feel free to split if this is really specific to 13. Thanks.
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `gotip-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Filed https://github.com/golang/go/issues/65468 for TestVMInfo
.
Found new dashboard test flakes for:
#!watchflakes
post <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
TestVMInfo failure is #62352. It seems that it matches this because it keeps retrying until it timed out, so test timed out
is printed.
I changed the pattern to "default", so the other issue should take priority.
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Found new dashboard test flakes for:
#!watchflakes
default <- builder ~ `(gotip|go1\.\d\d)-darwin-amd64` && (`test timed out` || `SIGQUIT` || `context deadline exceeded` || `command exceeded time limit` || status == "ABORT")
Go version
aba18d5b6785d501996b475d58a05cc26707d370
Output of
go env
in your module/workspace:What did you do?
Check status of https://ci.chromium.org/p/golang/g/go-gotip/console.
What did you see happen?
Multiple failures involving timeouts on the
darwin-amd64_13
builder:What did you expect to see?
No timeouts:
go test -short cmd/go
takes <20s to run locally, so either the builders should be faster, or they should set aGO_TEST_TIMEOUT_SCALE
that is long enough to reliably run the tests.The only timeouts on the builders should be for (a) true deadlocks, and (b) tests that take a similarly long time (approaching 3 minutes) when run locally.