docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.59k stars 485 forks source link

Feature request: Pruning of layers of specific image #1065

Open kasvtv opened 2 years ago

kasvtv commented 2 years ago

When using Docker for a large amount of projects on a personal machine, and frequently rebuilding large containers, one may find themselves requiring to prune cached layers from previous builds to free up disk space.

The only way to do this, is by pruning all dangling layers, using docker builder prune. This is inconvenient, as one will lose cached layers from all the containers of all projects on the machine.

Instead, it would be much nicer to have a command where one could prune the layers of a specific image, or alternatively, have a flag for the docker rmi or docker image rm command, that after removal of the image, also prunes all of its layers that are now dangling.

This way, one could have a build script where an image is rebuilt, and after rebuilding, remove the old image and its now dangling layers. This way, disk space is optimally preserved for the image in question, without affecting other projects.

thaJeztah commented 2 years ago

Thanks for opening this ticket;

The only way to do this, is by pruning all dangling layers, using docker builder prune. This is inconvenient, as one will lose cached layers from all the containers of all projects on the machine.

docker builder prune clears the BuildKit build-cache. While the "classic" (non-buildkit) builder used image layers as caching mechanism, BuildKit uses its own cache. The BuildKit cache is not directly associated with an "image" (e.g., it uses caching for build-context itself, as well as steps in the build), but there has been some discussions in the past to allow "labelling" the build-cache ("this cache was used as part of <label>, which could be "project X", or <insert your purpose>).

I need to search for those existing issues, but in either case, improving this will require changes in BuildKit (https://github.com/moby/buildkit) to provide the building-blocks for this feature. We recently moved all build-related code out of the docker cli (this) repository, delegating this functionality to the buildx component, so I'll transfer this ticket there (which is at least more related to that repository).

Instead, it would be much nicer to have a command where one could prune the layers of a specific image, or alternatively, have a flag for the docker rmi or docker image rm command, that after removal of the image, also prunes all of its layers that are now dangling.

This is already possible, and is what docker image prune does; docker image prune looks for images that are no longer used (and no longer associated with a tag / name (imagename:tag), and removes those.

It's not possible to remove "dangling" image for a specific image, as there is a N:1 relation between image names / tags and images (i.e., multiple image names can refer to the same image); same for "layers"; multiple images can share the same layer.

As mentioned above, the "builder" side of your request is more closely related to BuildKit (and buildx), so let me transfer this ticket to the buildx issue tracker.

tonistiigi commented 2 years ago

This is inconvenient, as one will lose cached layers from all the containers of all projects on the machine.

docker builder prune --keep-storage 2GB (or whatever size you want) is the best default when doing manual pruning. This will delete the oldest and least used cache first, leaving the most useful cache behind.

I think the solution we could consider would be to add some kind of tagging mechanism. So when you run your project you can tag the build cache it creates with a certain value. And then you can use this tag value as a filter for removing cache associated with a specific project.

There are already prune filters by ID/Type etc but they are not very convenient to use.

wojtrawi commented 2 years ago

This is already possible, and is what docker image prune does; docker image prune looks for images that are no longer used (and no longer associated with a tag / name (imagename:tag), and removes those.

This seems not to be true. After doing that I can see that build cache has increased by the amount of layers that existed in pruned images (checking using docker system df)

thaJeztah commented 2 years ago

This seems not to be true. After doing that I can see that build cache has increased by the amount of layers that existed in pruned images (checking using docker system df)

Hmm.. not sure how that would be related; the BuildKit build-cache does not use the image-store, and docker image prune only deletes images; code is in the ImagesPrune() function in the docker daemon; relevant part of that here; https://github.com/moby/moby/blob/ce550fa9c2e0f7b85a77e72a9596b2f92f1d0e32/daemon/images/image_prune.go#L93-L139

wojtrawi commented 2 years ago

What I experience is that after I prune image, the build cache increases and if I happen to build the same image, I can see that cache has been used to do that. However, I keep a separate image with dependencies so I do not mind doing docker builder prune -fafter each CI run. And I do not know how it is related, but I can see empty build cache (storage is 0 but items is not 0) and I can experience using cached layers from the kept image with dependencies. As if the build cache had some pointer to layers contained in an image. I'm not a DevOps, just FE developer so I do not know if it's desired or not, but that's how it is.

bdrnglm commented 1 year ago

I think the solution we could consider would be to add some kind of tagging mechanism. So when you run your project you can tag the build cache it creates with a certain value. And then you can use this tag value as a filter for removing cache associated with a specific project.

Couldn't find any issue dealing with this. Is this planned ?

There are already prune filters by ID/Type etc but they are not very convenient to use.

Where can we find this filters ? I could only find filters related to time limits and no ID or types.