Closed srcshelton closed 4 years ago
Are you building the images with Buildah, or Podman?
@TomSweeneyRedHat PTAL, this sounds concerning
Are you building the images with Buildah, or Podman?
All via podman, I don't have a separate buildah installation.
We do expect a fair number of none:none
images to be generated after builds, because of layer caching; that's not a bug.
However, the inability to remove them because of leftover build containers definitely is. I would expect that a podman image prune
would get rid of all of them without issue.
@srcshelton Multistage builds? (Multiple froms in a single build)
I'd expect podman to cache layers created during multistage build. Some of layers created during multistage build are untagged top level layers. Those probably show up in podman images ls.
Podman (buildah) should prevent you from removing such layer while build is still in progress. Perhaps there is an issue with this logic (if it exists)
I won't go much further with my guessing 🙂
@srcshelton Multistage builds? (Multiple froms in a single build)
I'd expect podman to cache layers created during multistage build. Some of layers created during multistage build are untagged top level layers. Those probably show up in podman images ls.
Podman (buildah) should prevent you from removing such layer while build is still in progress. Perhaps there is an issue with this logic (if it exists)
I won't go much further with my guessing 🙂
A fair number of multi-stage builds, yes.
Something certainly appears to be leaving locks/references behind, though - it there a way I can confirm the linkages which podman believes to exist next time I get a stuck image?
Also, is the expected behaviour that running podman image prune
should sometimes remove a single parent image (shown as untagged/''none') but leave behind single (presumably child) image (also showing as 'none') - and it takes several 'prune' runs to finally remove all hanging images... my assumption would be that 'prune' would recursively remove the stack of otherwise un-referenced child images up the the parent, rather than removing the parent only and so having to be run once per child.
I've actually seen up to five invocations of prune
each remove a single image, and then the next run removing many further and finally leaving the image ls
list clear of untagged images.
That image prune
behavior sounds like a separate bug - I'd expect it would remove everything...
That
image prune
behavior sounds like a separate bug - I'd expect it would remove everything...
I suspect that it's related, though - it doesn't always behave like this, but the frequency of occurrence does seem related to 'stuck' images apparently linked to a non-existent container.
Edit: Not sure, if there is anything wrong with prune.
I was able to simulate similar results when I created images using buildah.
I'm not sure if my observation is any way related to this considering @srcshelton isn't using buildah directly. But since podman is internally using buildah, I'd imagine that it's possible that if for any given reason build fails to delete container created during build, it might not be visible in podman.
I created few images with buildah and intentionally didn't clean containers that were created for building.
Results were similar to what @srcshelton observed. As you might expect, prune can't clean up images that have references:
podman system prune
WARNING! This will remove:
- all stopped containers
- all stopped pods
- all dangling images
- all build cache
Are you sure you want to continue? [y/N] y
Deleted Pods
Deleted Containers
Deleted Images
ubuntu:~/Desktop$ podman images --filter dangling=true
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 04c6992f6f81 7 minutes ago 211 MB
ubuntu:~/Desktop$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 04c6992f6f81 7 minutes ago 211 MB
ubuntu:~/Desktop$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Logs show following:
WARN[0004] Failed to prune image 04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798 as it is in use: Image used by b6775790719903771182efbb53500a4117740a040cd818add2c0e325eba783ad: image is in use by a container
Again, as you might expect, with buildah, I can see that there are still containers:
ubuntu:~/Desktop$ buildah ps -a
CONTAINER ID BUILDER IMAGE ID IMAGE NAME CONTAINER NAME
22b28fa451f0 * b5b4d78bc90c docker.io/library/centos:7 centos-working-container
b358c8263309 * 04c6992f6f81 04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798-working-container
b67757907199 * 04c6992f6f81 04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798-working-container-1
Now if I remove those and, podman prune works properly:
ubuntu:~/Desktop$ buildah rm -a
22b28fa451f08522c846bc1008a7e90be7c830136d1c7c83029103d12196ee05
b358c826330920217c8fb5e9d772088b17000e8bd4597b9bdf5f501d3314daf8
b6775790719903771182efbb53500a4117740a040cd818add2c0e325eba783ad
ubuntu:~/Desktop$ podman system prune
WARNING! This will remove:
- all stopped containers
- all stopped pods
- all dangling images
- all build cache
Are you sure you want to continue? [y/N] y
Deleted Pods
Deleted Containers
Deleted Images
04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798
I think this a bit confusing from user perspective. Image shouldn't be flagged as dangling, if there's still a container referencing to it. Considering that podman is able to figure out that it isn't allowed to delete the image, it should also understand that image isn't dangling.
The fact that podman and buildah share image lists, but they don't share container view can be quite confusing from user perspective. Would it be possible somehow to mark which applications are referencing to images? This would allow you to produce a log entry that tells user why image can't be removed. Better yet, images list could be enhanced to include this information
I was able to replicate this issue confirm my suspicions. Temporary containers created during build process are not visible to podman ps -a
. If build process gets terminated, podman will leave behind containers that are only visible to buildah.
To reproduce issue:
Console output from my test:
ubuntu:~/git/boo$ buildah ps -a
CONTAINER ID BUILDER IMAGE ID IMAGE NAME CONTAINER NAME
29f0f48ffa79 * ef15416724f6 docker.io/library/golang:1.7.3 golang-working-container
8ee03bd2384b * e2625ac641ae e2625ac641ae976720fafa5a105fdfedafde42fac4ef9d2f0ea4051747edf3a5-working-container
d4bcbb870ed3 * b8fc3574bb4d b8fc3574bb4d53a95cfeec5bf36dc71ac020d36ceac818301cd953235df6b7b3-working-container
a347c4b8880e * 19b92005f722 19b92005f722cef83229c01fe6f8fece67c7b47bdb15f79fdbc3073a812aa9f8-working-container
ubuntu:~/git/boo$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Example Dockerfile:
FROM golang:1.7.3 AS zoology
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY foo .
FROM alpine:latest AS foobar
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=zoology /go/src/github.com/alexellis/href-counter/foo .
FROM foobar AS goner
CMD ["./app"]
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /
COPY doo .
COPY --from=goner /root .
CMD ["./app"]
I think that you need to make these temporary containers created with podman build
visible to podman.
In my build-scripts, I've added traps around any podman invocations which commit or remove images - these processes seem fragile, and interrupting podman during an image rm
or image prune
operation seems to have an increased likelihood of resulting in an anomalous state.
Since podman integrates buildah, I've not (to date) had buildah separately installed... can these 'invisible' buildah images be managed in any way via podman's other commands or directly on the filesystem, or is having the separate buildah install pretty much a pre-requisite (... and is that intentionally so?)
@TomSweeneyRedHat or @ashley-cui Could you look into this. I think we have a long term issue to manage buildah images from within Podman.
I'm honestly more concerned about the missing manifests problem that was mentioned initially - if we could get more details (error messages, a reproducer) there, it would be greatly appreciated. The lack of Buildah integration is something we've known about for a while and is on our list of things to fix, but not being able to fully delete images is very bad.
I'll post here as soon as I can reproduce... although the locking I've added does (coincidentally?) seem to have cut-down on the frequency of occurrence.
To summarise my thinking: podman should be interruptible, safely. If any action can't be safely interrupted, then signals should be ignored until the critical action is completed. Alternatively, changes should be performed by staging updates and then atomically committing them - so that if the process is killed during staging then it can subsequently be cleaned-up, and once the change is committed then further processing is again safe.
(This becomes more complicated where podman is invoking separate or third-party components... but assuming that these are fragile until proven not is not necessarily the worst of plans...)
I spent an hour and a half trying to break system manually in a way that would not let me remove images with just podman. I simply couldn't get system in that kind of state. It could be that I'm using podman 2.0.0 - but I doubt that.
@srcshelton There isn't anything out of the ordinary with your system? I noticed that you are storing images in non-default location, is that local filesystem? Also, is your run root a tmpfs system? And finally, are you running podman in parallel?
I'm currently still on podman-2.0.0_rc6
awaiting the fix for ARG/ENV variables containing '=' (#6785), so it could have been fixed since. I saw the majority of errors, IIRC, whilst still on 2.0.0_rc5
.
My setup is probably a little odd, yes - I'm migrating from an old 32-bit system image to a 64-bit one I'm building, with as many services containerised as possible. As such, the podman
execution environment is actually a 64-bit chroot() from the 32-bit host system (all with a 64-bit kernel).
One notable anomaly about this setup, which I'm assuming is due to starting from a gaol, is that if I podman exec -it <container> /bin/sh
then I actually end up back in the ultimate (32-bit) root of the system, outside of the chroot() environment started from and not within the container filesystem at all!
(... this does seem to make for a novel break-out solution though, faced with being 'root' in a chroot() gaol...)
All filesystems are local (although some elements which get mounted into containers are themselves NFS-mounted) and I'm not running multiple podman instances in parallel.
I've been unable to reproduce the missing manifest problem in the past few days, so perhaps the issue was resolved sometime between the various 2.0.0_rc
releases?
I'm totally happy for this issue to be closed pending a recurrence, or left open a little longer in case I can get it to happen again...
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I'm assuming that this isn't intended behaviour, but podman generates and maintains a large number of (presumably) intermediate containers:
... many of which claim to be in use by a container when attempting to delete, even if no containers are running.
Sometimes these can be force-deleted, but often this results in podman then having stuck images for which it complains that manifests are missing - and the only solution I've found it to clear Podman's state and start again.
Steps to reproduce the issue:
Construct container images (including a mixture of 'build' and 'run/commit' stages)
podman image ls
Observe the number of none/none images
Describe the results you received:
Many untagged temporary(?) images listed
Describe the results you expected:
Untagged images to automatically be removed (without using 'prune') - or, if pruned or deleted manually, this should be a safe operation (even if forced) and shouldn't result in stuck/unreadable images which cannot be further processed.
Additional information you deem important (e.g. issue happens only occasionally):
Numerous untagged images are generated every time images are constructed. Often these claim to be associated with a container even if none exist. Sometimes these become corrupted on (forced) deletion, and appear to require podman state erased in order to remove entirely.
So there are effectively three related issues:
Untagged, presumably temporary images are kept after the successful completion of
build
commands. This may be intentional or designed to mirror docker behaviour, but also:Often, attempting to prune or manually delete these temporary images incorrectly results in a 'image is in use by container', even if
podman ps -a
shows no running containers;Force-deleting images apparently associated with a non-existent (or hidden?) container results in state corruption with
podman
reporting that the image manifest is missing or corrupt (with the overlay graph driver).Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Additional environment details (AWS, VirtualBox, physical, etc.):
n/a