Open bcmills opened 2 years ago
is it viable to get a higher SLA on
go.googlesource.com
?
We should also check to make sure these errors are accurately reflected in the monitoring for these services. (It isn't obvious to me that they are even showing up.)
For some reason the failure rate to this service on the linux-386-longtest
builder seems to be higher than on linux-amd64-longtest
.
greplogs --dashboard -md -l -e 'https://go\.googlesource\.com.* 5\d\d\b' --since=2022-01-11
2022-02-04T23:42:50-f1903fd/linux-386-longtest 2022-01-24T19:02:43-48ec6df/linux-386-longtest 2022-01-20T19:24:26-2c2e081/linux-386-longtest 2022-01-18T23:48:55-fa4df65/linux-386-longtest 2022-01-14T21:54:39-3b5eec9/linux-386-longtest
Another of what appears to be a “googlesource server flakiness” failure mode:
> go list -m gopkg.in/src-d/go-git.v4
[stderr]
go: gopkg.in/src-d/go-git.v4@v4.13.1 requires
golang.org/x/tools@v0.0.0-20190729092621-ff9f1409240a: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in $WORK/gopath/pkg/mod/cache/vcs/7d9b3b49b55db5b40e68a94007f21a05905d3fda866f685220de88f9c9bad98a: exit status 128:
error: 24736 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: index-pack failed
[exit status 1]
greplogs --dashboard -md -l -e 'unexpected disconnect while reading sideband packet' --since=2021-01-01
2022-04-11T16:31:42-910a33a/linux-amd64-longtest 2022-04-11T16:31:40-6130b88/linux-386-longtest 2022-03-25T18:19:09-f25631b/linux-386-longtest
And more of the conventional flakiness too.
greplogs --dashboard -md -l -e 'https://go\.googlesource\.com.* 5\d\d\b' --since=2022-02-07
2022-04-11T16:31:43-036b615/linux-386-longtest 2022-02-22T15:23:59-d17b65f/linux-amd64-longtest 2022-02-21T21:28:40-c9fe126/linux-amd64-longtest 2022-02-15T19:04:00-b5af5c0/linux-amd64-longtest 2022-02-07T18:19:38-8f374aa/linux-amd64-longtest
greplogs -l -e 'https://go\.googlesource\.com.* 5\d\d\b' --since=2022-04-12
2022-04-27T17:20:34-f0ee7fd/linux-386-longtest
And I've noticed another log pattern that seems to be the same failure mode, just through the git
binary instead of go
:
greplogs -l -e 'golang\.org/x.*: git fetch .*: exit status 128:\n\s+.*The requested URL returned error: 502' --since=2021-01-01
2022-05-02T17:05:14-64369c3/linux-386-longtest
2022-04-28T15:11:42-4d35071/linux-amd64-longtest
2022-01-21T16:59:19-9eba5ff/linux-amd64-longtest
2022-01-20T19:24:26-2c2e081/linux-386-longtest
2022-01-10T22:48:40-6019a52/linux-amd64-longtest
2022-01-05T21:22:03-2b39d86/linux-386-longtest
2021-12-02T22:03:54-25f06cb/linux-386-longtest
2021-11-30T18:09:02-931d80e/linux-amd64-longtest
2021-11-10T18:24:14-6406e09/linux-amd64-longtest
2021-11-09T19:01:20-526b2ef/linux-386-longtest
2021-11-04T21:52:36-8ad0a7e/linux-386-longtest
2021-11-01T22:55:50-02e5913/linux-amd64-longtest
2021-10-29T18:56:29-6afdf01/linux-386-longtest
2021-10-28T16:54:29-5fce1d9/linux-386-longtest
2021-10-28T16:08:36-6f0185b/linux-386-longtest
2021-10-27T21:37:54-749f6e9/linux-386-longtest
2021-10-27T20:03:17-68bd512/linux-386-longtest
2021-10-26T21:17:38-091db63/linux-amd64-longtest
2021-10-22T00:57:18-9ff91b9/linux-amd64-longtest
2021-10-12T20:20:41-d032b2b/linux-386-longtest
2021-09-15T17:32:52-6196979/linux-amd64-longtest
2021-09-15T16:32:27-72bb818/linux-386-longtest
2021-09-14T23:07:15-137543b/linux-amd64-longtest
2021-09-14T23:03:28-3a72175/linux-386-longtest
2021-09-08T16:19:36-409434d/linux-386-longtest
2021-06-28T23:31:05-1519271/linux-amd64-longtest
2021-05-26T22:43:54-1d5298d/linux-amd64-longtest
greplogs -l -e 'golang\.org/x.* git .*\n(\s+.*)*The requested URL returned error: 502' --since=2022-05-03
2022-05-05T14:26:30-1926fa5/linux-amd64-longtest
@golang/release, I've filed this internally as b/231704574.
greplogs -l -e 'golang\.org/x.* git .*\n(\s+.*)*The requested URL returned error: 5\d\d' --since=2022-05-05
2022-05-09T20:24:02-13bda0e/linux-amd64-longtest
2022-05-09T18:06:51-0d410d6/linux-amd64-longtest
2022-05-09T16:02:28-9ae7dc3/linux-amd64-longtest
2022-05-05T14:26:30-1926fa5/linux-amd64-longtest
Change https://go.dev/cl/405714 mentions this issue: cmd/go: add timestamps to script test output
Found new dashboard test flakes for:
#!watchflakes
post <- `golang\.org/x.* git .*\n(\s+.*)*The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `golang\.org/x.* git .*\n(\s+.*)*The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Converting mod_invalid_version
to use the local vcweb
would help a lot with these; it seems to be the one that hits this most often.
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Found new dashboard test flakes for:
#!watchflakes
post <- `https://go\.googlesource\.com.*: The requested URL returned error: 5\d\d`
Some of the
cmd/go
long-mode tests end up cloning repos controlled by the Go project, such asgolang.org/x/sys
.Other long-mode tests send requests to a dedicated test server (
vcs-test.golang.org
), various GitHub repos, packages hosted onrsc.io
, andgopkg.in
. The failure rate againstvcs-test.golang.org
is, as far as I can tell, negligible. The GitHub andgopkg.in
tests for the most part check integration with services outside the control of the Go project. Thersc.io
tests have a low but nontrivial failure rate (filed as #49954), and could perhaps be migrated tovcs-test.golang.org
if need be.However, the dependency on
go.googlesource.com
for thex
repos is not easy to avoid, and as far as I can tell its failure rate since September has dwarfed all of those other services. Many of these failures already result in long (2m+) hangs that could otherwise push the test over its deadline, so I don't think it would be viable to just add retries (#28194) — especially given that if the problem could be fixed by retrying, the server could presumably do that retry internally itself.I would rather not lose coverage or add complexity to the tests by fetching the Go repos only through
proxy.golang.org
— @golang/release, is it viable to get a higher SLA ongo.googlesource.com
?greplogs --dashboard -md -l -e 'https://go\.googlesource\.com.* 5\d\d\b' --since=2021-01-01