Open zanderso opened 5 days ago
808 and 812 also have an abnormally high job failure rate:
builder | red | purple | total | rate |
---|---|---|---|---|
build806-m9 | 26 | 10 | 90 | 40.0% |
build808-m9 | 13 | 4 | 90 | 18.9% |
build812-m9 | 13 | 1 | 90 | 15.6% |
In the try pool, the outliers I'm seeing are:
buidler | red | purple | total | rate |
---|---|---|---|---|
build857-m9 | 43 | 0 | 90 | 47.8% |
build862-m9 | 41 | 0 | 90 | 45.6% |
build858-m9 | 27 | 0 | 90 | 30.0% |
Filed internal https://b/376081455 to investigate the bots.
Based on the fact that many of the task failures are due to a failure to install Xcode, my suspicion is that we should be following the flow to delete the caches on the machine before requesting that the machines be re-imaged, or suspecting that the hardware is bad: go/flutter-luci-playbook under "Delete corrupted Mac caches".
From https://chromium-swarm.appspot.com/bot?id=build806-m9
XCode installs are failing due to data corruption as in:
https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8733087603218910657/+/u/install_xcode/install_xcode_from_cipd/stdout
There are other failure modes as well. I suspect a bad disk or bad memory. The automatic reboots on task failure don't seem to be helping.
I put up this CL to remove it from the prod pool, but I'm wondering if this is the right thing to do? https://chrome-internal-review.googlesource.com/c/infradata/config/+/7775927
cc @christopherfujino I'm wondering if bad machines like this are the underlying cause of https://github.com/flutter/flutter/issues/156614 ?