Closed murog closed 3 years ago
Looks like memory is being eaten up by:
Was hoping that docker system prune -a
would also clear what is in the overlay dir, but it only freed up ~9.5GB
This temporarily frees up enough space for CI to run, but it is still high use (80%)
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 46G 1.3G 98% /
overlay 50G 46G 1.3G 98% /var/lib/docker/overlay2/a21c942...6a33/merge
d
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 53 3 9.325GB 3.235GB (34%)
Containers 38 1 128.7MB 125.7MB (97%)
Local Volumes 37 1 6.474GB 0B (0%)
Build Cache 0 0 0B 0B
second runner: isn't maxed out yet but is trending towards:
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 130 2 7.648GB 4.018GB (52%)
Containers 9 1 31.14MB 28.07MB (90%)
Local Volumes 14 1 5.927GB 0B (0%)
Build Cache 0 0 0B 0B
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 40G 7.9G 84% /
overlay 50G 40G 7.9G 84% /var/lib/docker/overlay2/304fef...dc83b/merged
Steps I took to clean up on actions runner 1:
docker ps
--> saw that the KIND control plane was still running (no longer needed since we're not using Kind in deploy tests anymore.) I removed the image and removed the volumes: docker volume rm $(docker volume ls -q)
, then re-ran df -h
, reduced usage by ~15% Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 39G 8.4G 83% /
Then re-ran docker system prune -a
but added the -f
(force) flag:
Total reclaimed space: 5.887GB
/dev/sda1 50G 33G 15G 70% /
Did the same steps on actions runner 2.
docker system prune -a -f
--> Total reclaimed space: 10.15GB
/dev/sda1 50G 32G 16G 68% /
sudo du -h --max-depth=1
and saw that the /tmp
directory was taking up 15G, due to a set of 1GB image-tar
files created from March to present. Not sure where these are coming from, maybe a tool install / upgrade? I opened a new issue to try to figure out where these .tars are coming from, so we can auto-remove them in our scripts in the future** Now, actions runner 1:
/dev/sda1 50G 19G 29G 40% /
Runner 2:
/dev/sda1 50G 19G 29G 39% /
Closing this in favor of #351 (automate cleaning up the runner disks) - to address at a later date.
Description
CI is intermittently failing when ran on first actions runner with
No space left on device
errorCurrent Behavior
https://github.com/GoogleCloudPlatform/bank-of-anthos/actions/runs/238157194
Expected Behavior
CI should have enough available storage to run