docker-archive / docker-registry

This is **DEPRECATED**! Please go to https://github.com/docker/distribution
Apache License 2.0
2.88k stars 876 forks source link

Garbage collection does not seem to do its job completely #1093

Closed mbert closed 7 years ago

mbert commented 7 years ago

Operating system: CentOS 7.3 Docker stack: # rpm -qa | grep docker docker-common-1.10.3-59.el7.centos.x86_64 docker-distribution-2.5.1-1.el7.x86_64 docker-1.10.3-59.el7.centos.x86_64 docker-selinux-1.10.3-46.el7.centos.14.x86_64

We are running a private Docker registry into which we push images from our CI:

  1. Every night our base images (like 'base', 'build', ...) are re-created and pushed over with the same tag (as long as we don't decide to do a version bump).
  2. We create images for some artifacts (e.g. internal services) along with their respective release builds and push them.
  3. Also there are images created and pushed for particular systems' releases which we use for integration testing.

Hence in our registry we will have a number of base images refreshed on nightly basis, as well as some 'derived' images that were built from older generations of the above base images (which were the most recent at time of build). Consequently, there will be lots of old layers still in use because of the 'derived' images.

Every night we run a cleanup job which first identifies older versions we don't want to keep and marks them for deletion. After that we shut down the registry, perform a garbage collection and start it up again. From the logs we can see that the garbage collector indeed removes a number of layers and frees some disk space.

With this setup we observe constant growth of disk space usage on the registry machine.

Since we were not sure whether this was actually to be expected since we have some old images built on earlier base image generations, we did a little experiment:

  1. Create a full list of images and tags on the registry from the metadata
  2. Pull all these images and tags to a second machine with sufficient disk space
  3. Shut down the registry, wipe out the registry volume, start the registry
  4. Push all previously pulled images back to the registry

Expected behaviour: disk space usage on the registry should be the same. Observed behaviour: disk space usage was only a fraction of what we had before.

When using the term "fraction" we are talking about 499 GB before and 30 GB after that operation.

After having applied this measure, disk space usage on the registry is growing steadily again.

mortensteenrasmussen commented 7 years ago

Hi,

This repo is for the old and deprecated v1 repo. The new repo is https://github.com/docker/distribution

We had the same issue, and I figured out what's going on. Please check this issue.

Specifically, these two comments should help you out: https://github.com/docker/distribution/issues/2190#issuecomment-279323179 https://github.com/docker/distribution/issues/2190#issuecomment-279343010

If you're using https or authentication on your registry, please read some of the additional comments (I changed the gist a bit from the first comment, but it should be fairly self-explanatory).

This should help you find and delete the manifests that don't have tags associated (which happens when you push an image:tag on top of the same image:tag in the registry.

Run it once to clean up, and then maybe put it in a cronjob or something like that. Naturally do this at your own risk, but this has helped several other people with the same issue :)

mbert commented 7 years ago

Thank you, that looks promising indeed. Since this is the wrong repo for this bug I'm closing this now.