golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

x/build: solaris-amd64-oraclerel failures with "no space left on device" #46362

Open bcmills opened 3 years ago

bcmills commented 3 years ago

The solaris-amd64-oraclerel builder seems to be failing moderately frequently with no space left on device errors:

2021-05-24T20:15:56-15d9d4a/solaris-amd64-oraclerel 2021-05-10T15:11:50-ecb7392/solaris-amd64-oraclerel 2021-05-03T16:42:22-169155d/solaris-amd64-oraclerel 2021-04-28T19:13:50-ad989c7/solaris-amd64-oraclerel 2021-04-26T21:27:41-9f60169/solaris-amd64-oraclerel 2021-04-08T20:55:59-0243799/solaris-amd64-oraclerel 2021-04-08T02:08:45-b261fe9/solaris-amd64-oraclerel 2021-03-30T21:06:17-4fbd30e/solaris-amd64-oraclerel

It's not obvious to me whether the buildlet script is failing to clean something up, the device's disk is getting too full for other reasons, or perhaps the builder is just configured to run too many builds in parallel.

CC @golang/release @rorth

rorth commented 3 years ago

"Bryan C. Mills" @.***> writes:

The solaris-amd64-oraclerel builder seems to be failing moderately frequently with no space left on device errors: [...] It's not obvious to me whether the buildlet script is failing to clean something up, the device's disk is getting too full for other reasons, or perhaps the builder is just configured to run too many builds in parallel.

I've looked around and there may be several issues:

For remedy, there are several options:

I'll look into either of those.

bcmills commented 2 years ago

This has started occurring intermittently again.

greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.* no space left on device' --since=2021-03-26

2022-04-23T05:38:56-9717e8f/solaris-amd64-oraclerel 2022-04-19T17:05:22-4804c43-689dc17/solaris-amd64-oraclerel [note 11-month gap!] 2021-05-24T20:15:56-15d9d4a/solaris-amd64-oraclerel 2021-05-10T18:10:43-ecb7392-73d5aef/solaris-amd64-oraclerel 2021-05-03T16:42:22-169155d/solaris-amd64-oraclerel 2021-04-28T19:13:50-ad989c7/solaris-amd64-oraclerel 2021-04-26T21:27:41-9f60169/solaris-amd64-oraclerel 2021-04-08T21:58:35-0243799-d67e739/solaris-amd64-oraclerel 2021-04-08T07:33:58-b261fe9-a7e16ab/solaris-amd64-oraclerel 2021-03-31T14:26:53-4fbd30e-2940614/solaris-amd64-oraclerel

rorth commented 2 years ago

"Bryan C. Mills" @.***> writes:

This has started occurring intermittently again.

greplogs --dashboard -md -l -e '(?ms)\Asolaris-amd64-oraclerel.* no space left on device' --since=2021-03-26

2022-04-23T05:38:56-9717e8f/solaris-amd64-oraclerel 2022-04-19T17:05:22-4804c43-689dc17/solaris-amd64-oraclerel [note 11-month gap!] [...]

I was recently forced to migrate the zone hosting the builder to a different machine. In the process, swap was inadvertently reduced from 32 GB to 4 GB. With WORKDIR residing in /tmp (tmpfs), VM shortage could lead to those errors.

I've now restored the previous swap size, which should make the problem vanish, like it did for the last year.

bcmills commented 2 years ago

greplogs -l -e '(?ms)\Asolaris-amd64-oraclerel.* no space left on device' --since=2022-04-24 2022-05-03T19:58:15-7c404d5/solaris-amd64-oraclerel 2022-04-25T15:49:44-12763d1/solaris-amd64-oraclerel