Open gopherbot opened 1 month ago
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "cmd/cgo/internal/testcarchive" && test == "TestManyCalls"
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "cmd/cgo/internal/testcarchive" && test == "TestManyCalls"
There are more flaky failures on this since June 20, https://ci.chromium.org/ui/test/golang/cmd%2Fcgo%2Finternal%2Ftestcarchive.TestManyCalls?q=V%3Abuilder%3Dgotip-linux-amd64-longtest-test_only+V%3Ago_branch%3Dmaster+V%3Agoarch%3Damd64+V%3Agoos%3Dlinux+V%3Ahost_goarch%3Damd64+V%3Ahost_goos%3Dlinux which were matched to #61069.
Tentatively mark as a release blocker, as the high-rate flaky failures seem to start recently.
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "cmd/cgo/internal/testcarchive" && test == "TestManyCalls"
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "cmd/cgo/internal/testcarchive" && test == "TestManyCalls"
I pulled the test timing data from LUCI.
So it seems the test timing was pretty steadily <20 seconds, then it starts rising since d881ed63, up to 180ea455 which is 50+ seconds. At afbbc289 it drops back to mostly <30 seconds (with few exceptions), until d79c3509 it rises again.
For the first elevated period, as afbbc289 is a revert of d881ed63, it is likely that d881ed63 is the cause and the revert fixes it.
For the second elevation, d79c3509 is unrelated, so are a few commits nearby. So it is probably something else. Maybe builder change?
Then I pulled the test timing data for the 1.22 release branch, which doesn't have any of those commits.
It seems the timing also rises since June 24, on 3560cf0a. The commit might be related, but the corresponding master branch CL landed on May 15, which did not cause the time rising. As the rising on on the release branch is about the same time as on the master branch, I'm more leaning towards builder configuration change. @mknyszek is there any builder configuration change at the time around June 24 or so?
Lastly, the test has a hard-coded 1 minute timeout. I don't think it makes much sense to say 55 seconds is pass, 65 seconds is fail. So the 1 minute timeout may be a bit arbitrary. We probably want to replace it with t.Deadline.
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "cmd/cgo/internal/testcarchive" && test == "TestManyCalls"
Change https://go.dev/cl/599056 mentions this issue: cmd/cgo/internal/testcarchive: remove 1-minute timeout
CL https://go.dev/cl/599056 replaced the 1-minute timeout with the test deadline. Hopefully this will stop the failures on the builder. At this point I think the test slowdown is more likely due to builder infrastructure change, not the code change in the main repo (which we're going to release). Also, the test was written to catch deadlocks, not speed. Finishing in more than a minute is still deadlock. So at this point I think it is not a release blocker.
It would be great we can understand the slowdown, though. So keep the issue open.
Issue created automatically to collect these failures.
Example (log):
— watchflakes