go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.04k stars 5.49k forks source link

Automatically clean up docker images in the registry without a tag pointing to them #21673

Open kolaente opened 2 years ago

kolaente commented 2 years ago

When pushing new docker images for an existing tag, the old image still exists and uses up storage one the server. While you can use images just by pointing to their sha, I've yet to find someone who actively uses that. For my own registry (portus) I have a cron job to automatically remove everything that does not have a tag pointing to it. Docker even has a command for this.

Having a cleanup job like that would allow to keep old versions but still solve the storage space problem.

@KN4CK3R in https://github.com/go-gitea/gitea/issues/21658#issuecomment-1301794468:

No, only if it's "older than" or not included in the "keep pattern". But it should be no problem to add a special logic here because there is already the custom Version == "latest" for containers.

Gitlab has an automatic garbage collection process for this: https://docs.gitlab.com/ee/administration/packages/container_registry.html#removing-untagged-manifests-and-unreferenced-layers

I think it's best to discuss this before implementing, mostly regarding these open questions:

  1. Should this be enabled automatically?
  2. Should this be a repo/org setting or a global config one?
KN4CK3R commented 2 years ago

Just for clarification, a repo has no impact on packages:

Should this be a repouser/org setting or a global config one?

I checked again how I implemented this and currently there are no untagged images in the container registry! (Exception: If you upload a multiarch image, the different arches are untagged images) If you tag and push an image you can later pull that image with the tag and its hash. If a tag gets pushed again the old tag/version gets removed and that deletes the hash reference too. So after that operation there is no untagged image available anymore.

https://github.com/go-gitea/gitea/blob/f17edfaf5a31ea3f4e9152424b75c2c4986acbe3/routers/api/packages/container/manifest.go#L309-L312

So at the moment the cleanup does not need to remove untagged images because there are none. The question should first be "Should Gitea keep untagged version?"

silverwind commented 2 years ago

Use case sounds pretty similar to git gc which we already automatically run as a cron IIRC.

Should this be enabled automatically?

If it's stable, I'd say so.

Should this be a repo/org setting or a global config one?

I think global is sufficient. Ideally it should just be another cron to cleanup orphaned images, like we already do for orphaned git commits via git gc.

theodiem commented 1 year ago

I've came across this issue after experiencing the same effect. Building multiarch images when only the manifest is tagged, left me with lots of "packages" behind with only the digest (the manifest had only one copy since it was tagged).

Tagging each arch so it gets overwritten makes the "details" tab a bit impractical when you have too much different arch and versions (for matrix builds).

Should this be a repo/org setting or a global config one?

In my case, I would be happy with the exact same global mechanism described (similar to the cron that runs git gc)

salasrod commented 1 year ago

I am also looking for a similar feature, going out of my way to manually prune images is painful.

lunny commented 1 year ago

Doesn't #21658 resolved the issue?

kolaente commented 1 year ago

@lunny I didn't test it but I don't think so. The PR allows to configure rules for removal of tags, I just want to remove every image layer not associated with a tag.

peiwenxu commented 1 year ago

Is this still happening?

jum commented 1 year ago

No, I have 1.20.4 running and it does not happen.

Am 19. September 2023 14:06:27 MESZ schrieb Peiwen Xu @.***>:

Is this still happening?

-- Reply to this email directly or view it on GitHub: https://github.com/go-gitea/gitea/issues/21673#issuecomment-1725380028 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

silverwind commented 1 year ago

No one has implemented this yet, but it's definitely a vital feature to conserve disk space.

Maybe it should be disabled by default to support pulling image by hash, which is a rare, but valid use case.

c521wy commented 9 months ago

Does anyone tried this cleanup rule?

image
kolaente commented 9 months ago

Does anyone tried this cleanup rule?

image

Using that and then checking with the preview yields no results, does not look like its working.

kolaente commented 9 months ago

It looks like the official docker registry implementation uses this function to find and remove all untagged layers, as described here.

@KN4CK3R As far as I understood from glancing over the code, Gitea does not just "embed" the official registry package, so it's not as easy as just copying or calling that function?

mhkarimi1383 commented 7 months ago

I'm facing the same issue with the latest version of gitea

ViRb3 commented 3 months ago

Does anyone tried this cleanup rule?

image

The following seems to work perfectly! It deletes all images that do not have an associated tag with them. I would just suggest using ^sha256:.+ instead, as you could otherwise match a tag that for some reason has sha256 in the middle.

KimonHoffmann commented 2 months ago

Be careful with this approach, when using multi platform images! In this case the individual platform images might be untagged, but the images themselves may still be referenced (by the multi platform manifest that is). I'm currently trying to deal with this problem myself and have not yet found a way that does not require deeper insight into the relationships of the images involved.

If someone has something to suggest that'd be very welcome!

gjung56 commented 2 months ago

Yes, the cleanup rule delete platform variants images.

Until we can find a integrated solution, I ended up with an external cronjob that prune old images in my self-hosted instance.

I fetched the registry api and used the gitea golang sdk, in a hacky way but It's working.

gitea_registry_prune.go.txt