Closed dsotirho-ucsc closed 7 months ago
As for the GitLab registry, there are a few administrative actions to be taken:
1) On older GitLab instances, the cleanup needs to be enabled globally 2) On all instances, a clean-up policy needs to be set up for each project (90 days is the longest possible retention) 3) The clean-up policy only deletes tags, the resulting dangling images will need to be removed using garbage collection which we'll need to add as a timer unit to systemd
Only the system administrator can perform 1 and 2, 3 can be done by anyone, and requires a PR.
Our weekly purge job should actually have removed many of the images listed above. Spike to determine why that's not working.
Our weekly purge job should actually have removed many of the images listed above. Spike to determine why that's not working.
Typo in --filter
option of prune images
command
https://docs.docker.com/engine/reference/commandline/image_prune/
The until filter can be Unix timestamps, date formatted timestamps, or Go duration strings (e.g. 10m, 1h30m) computed relative to the daemon machine’s time. Supported formats for date formatted time stamps include RFC3339Nano, RFC3339, 2006-01-02T15:04:05, 2006-01-02T15:04:05.999999999, 2006-01-02Z07:00, and 2006-01-02.
Index: terraform/gitlab/gitlab.tf.json.template.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/terraform/gitlab/gitlab.tf.json.template.py b/terraform/gitlab/gitlab.tf.json.template.py
--- a/terraform/gitlab/gitlab.tf.json.template.py (revision 4fedfaa30ef3105122875bd3af97f01e18c7bea8)
+++ b/terraform/gitlab/gitlab.tf.json.template.py (date 1701304390761)
@@ -1836,7 +1836,7 @@
'prune', # … to delete, …
'--force', # … without prompting for confirmation, …
'--all', # … all images …
- f'--filter "until={30 * 24}"', # … except those from more recent builds.
+ f'--filter "until={30 * 24}h"', # … except those from more recent builds.
#
# If we deleted more recent images, we
# would risk failing the requirements
Tested manually on gitlab dev
however my shell timed out before command completed.
[ec2-user@ip-172-71-0-215 log]$ sudo docker exec -it gitlab-dind /bin/sh
/ #
/ # docker image prune --all --filter "until=720"
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Total reclaimed space: 0B
/ #
/ # docker image prune --all --filter "until=720h"
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Connection to ssh.gitlab.dev.singlecell.gi.ucsc.edu closed by remote host.
Connection to ssh.gitlab.dev.singlecell.gi.ucsc.edu closed.
Upon reconnection I was able to confirm a bunch of images had been pruned
[ec2-user@ip-172-71-0-215 ~]$ sudo docker exec -it gitlab-dind docker image ls --format=json | jq -r '[.CreatedAt, .ID, .Size, .Repository, .Tag] | join("\t")' | sort -u -k 1,1
2020-10-09 18:00:33 +0000 UTC d1226e1554f8 60.4MB gitlab/gitlab-runner-helper x86_64-264446b2
2020-12-29 02:47:00 +0000 UTC 97396fa3d959 1.42GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev 999
2021-03-25 22:21:58 +0000 UTC a9275369af8c 70.2MB gitlab/gitlab-runner-helper x86_64-54944146
2021-07-16 19:12:58 +0000 UTC ae0f95f06fc6 1.38GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev 6638
2022-01-24 19:33:44 +0000 UTC 40d5e650e7cc 66.9MB registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper x86_64-98daeee0
2022-01-31 18:37:35 +0000 UTC fb1b155d1740 1.34GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev 12319
2022-03-16 20:15:09 +0000 UTC b083053d75df 1.39GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev 12425
2022-07-24 23:03:12 +0000 UTC c526698c011d 67MB registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper x86_64-76984217
2022-07-25 21:19:00 +0000 UTC f1148a65c439 1.37GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev 16191
2023-10-31 16:46:55 +0000 UTC 484fe096818c 1.66GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev-deps 27906
2023-11-07 04:05:05 +0000 UTC 21d53c42b0b3 1.66GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev-deps 28072
2023-11-08 02:25:56 +0000 UTC 506bba297b95 2.2GB 122796619775.dkr.ecr.us-east-1.amazonaws.com/docker.io/ucscgi/azul-pycharm 2023.2.3-5
2023-11-18 07:55:38 +0000 UTC 3d233bd976d5 685MB 122796619775.dkr.ecr.us-east-1.amazonaws.com/docker.io/ucscgi/azul-elasticsearch 7.17.15-5
2023-11-20 04:19:27 +0000 UTC b9a38e6e7b3d 350MB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/runner <none>
2023-11-26 20:41:13 +0000 UTC 4ea739ce84ee 2.05GB docker.gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/dev-deps 28593
[ec2-user@ip-172-71-0-215 ~]$
data disk usage for GitLab dev
now at ~58.2%
[ec2-user@ip-172-71-0-215 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 652K 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.5G 13G 38% /
tmpfs 7.8G 0 7.8G 0% /tmp
/dev/nvme1n1 197G 109G 79G 59% /mnt/gitlab
tmpfs 1.6G 0 1.6G 0% /run/user/1000
@hannes-ucsc: "Assignee to implement the fix found by @dsotirho-ucsc in a PR but also increase the image retention from 30 days to 90. The PR checklist should contain items for the administrative tasks necessary to set up the registry cleanup policy for GitLab."
Assignee to also run the manual purge on anvildev
ASAP.
Assignee to also run the manual purge on
anvildev
ASAP.
[ec2-user@ip-172-73-0-46 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 648K 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.7G 13G 39% /
tmpfs 7.8G 0 7.8G 0% /tmp
/dev/nvme1n1 148G 106G 35G 76% /mnt/gitlab
tmpfs 1.6G 0 1.6G 0% /run/user/1000
[ec2-user@ip-172-73-0-46 ~]$
[ec2-user@ip-172-73-0-46 ~]$
[ec2-user@ip-172-73-0-46 ~]$ sudo docker exec -it gitlab-dind /bin/sh
/ #
/ # docker image prune --all --filter "until=2160h"
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
…
Total reclaimed space: 24.49GB
/ #
/ # exit
[ec2-user@ip-172-73-0-46 ~]$
[ec2-user@ip-172-73-0-46 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 648K 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme0n1p1 20G 7.7G 13G 39% /
tmpfs 7.8G 0 7.8G 0% /tmp
/dev/nvme1n1 148G 81G 60G 58% /mnt/gitlab
tmpfs 1.6G 0 1.6G 0% /run/user/1000
Assignee to also run the manual purge on anvilprod
ASAP.
Assignee to also run the manual purge on
anvilprod
ASAP.
[ec2-user@ip-172-74-0-28 ~]$
[ec2-user@ip-172-74-0-28 ~]$ df -h | grep gitlab
/dev/nvme1n1 148G 108G 33G 77% /mnt/gitlab
[ec2-user@ip-172-74-0-28 ~]$
[ec2-user@ip-172-74-0-28 ~]$ sudo docker exec -it gitlab-dind /bin/sh
/ #
/ # docker image ls | wc -l
6463
/ #
/ # docker image prune --all --filter "until=2160h"
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
…
Total reclaimed space: 31.98GB
/ #
/ # docker image ls | wc -l
2179
/ #
/ # exit
[ec2-user@ip-172-74-0-28 ~]$
[ec2-user@ip-172-74-0-28 ~]$
[ec2-user@ip-172-74-0-28 ~]$ df -h | grep gitlab
/dev/nvme1n1 148G 75G 66G 54% /mnt/gitlab
[ec2-user@ip-172-74-0-28 ~]$
For demo, attempt to reproduce on every GitLab instance. Show evidence that timer units ran successfully.
gitlab-ctl
. Apparently output from docker exec
against a running container does not get captured by the /etc/docker/daemon.json so we should remove the StandardOutput
and StandardError
options from any unit that uses docker exec
.In progress
.
GitLab container registry
gitlab-dind