containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.8k stars 2.42k forks source link

podman-2.0.0 (and previous) generate large numbers of untagged containers #6801

Closed srcshelton closed 4 years ago

srcshelton commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm assuming that this isn't intended behaviour, but podman generates and maintains a large number of (presumably) intermediate containers:

$ podman image ls
REPOSITORY                                  TAG              IMAGE ID      CREATED       SIZE
localhost/service.net-misc.zxtm             20.1             ace08f9af4ae  5 hours ago   786 MB
<none>                                      <none>           82ba212d2414  5 hours ago   991 MB
localhost/service.sys-apps.watchdog         5.15             2d59bf51412e  5 hours ago   46.8 MB
<none>                                      <none>           5ac5ae5a7697  5 hours ago   695 MB
localhost/service.app-admin.syslog-ng       3.22.1           b49116d1451e  5 hours ago   79.8 MB
<none>                                      <none>           ef388f8d1bc4  5 hours ago   728 MB
localhost/service.dev-vcs.subversion        1.13.0-r1        7ffefd926798  5 hours ago   90.9 MB
<none>                                      <none>           7c3affd0c86b  5 hours ago   752 MB
<none>                                      <none>           c35cd614db91  5 hours ago   201 MB
<none>                                      <none>           d619f625b512  5 hours ago   1.23 GB
localhost/service.mail-filter.spamassassin  3.4.4-r4         10b35f157c99  6 hours ago   201 MB
<none>                                      <none>           2fa66cba6e27  7 hours ago   850 MB
localhost/service.mail-filter.postgrey      1.36-r1          2d3c3e7efbac  7 hours ago   138 MB
<none>                                      <none>           0eaa2ef95c99  7 hours ago   786 MB
localhost/service.dev-db.redis              5.0.8            6b06efec875a  7 hours ago   63 MB
<none>                                      <none>           4d103f34d8b2  7 hours ago   893 MB
localhost/service.mail-mta.postfix          3.5.1            754fb4f56bc3  7 hours ago   188 MB
<none>                                      <none>           3d8c43cabbaf  7 hours ago   1.17 GB
localhost/service.dev-lang.php              7.4.6            8a1203411daf  8 hours ago   238 MB
<none>                                      <none>           b37be3924ea7  8 hours ago   1.24 GB
localhost/service.net-misc.openntpd         6.2_p3-r1        6affe69f87ea  8 hours ago   55 MB
<none>                                      <none>           9052b310079f  8 hours ago   1.06 GB
localhost/service.mail-filter.opendkim      2.10.3-r17       d7c1b285e9dd  8 hours ago   96 MB
<none>                                      <none>           3fddea7e0838  8 hours ago   1.14 GB
localhost/service.net-dns.bind              9.14.12          df9abc6113cb  9 hours ago   98.7 MB
<none>                                      <none>           f76a0db6fa1b  9 hours ago   929 MB
localhost/service.net-misc.memcached        1.6.6            3659d84f1d37  9 hours ago   124 MB
<none>                                      <none>           09f83716ced5  9 hours ago   772 MB
localhost/service.dev-db.mariadb            10.4.12          31f4b619030c  9 hours ago   374 MB
<none>                                      <none>           a7a31a086d06  9 hours ago   1.21 GB
localhost/service.app-forensics.lynis       2.7.5            b2f6ee5e2352  9 hours ago   58.7 MB
<none>                                      <none>           65aabdaa777b  9 hours ago   707 MB
localhost/service.www-servers.lighttpd      1.4.55           74b51f594881  9 hours ago   148 MB
<none>                                      <none>           e263b6cfc1dc  9 hours ago   836 MB
localhost/service.sys-apps.irqbalance       1.6.0            4c90ad2661b8  9 hours ago   65.6 MB
<none>                                      <none>           c3e78881dad4  9 hours ago   714 MB
localhost/service.net-mail.imapproxy        1.2.8_p14843-r1  6e8dba521f0c  9 hours ago   71.5 MB
<none>                                      <none>           b80d51dda54c  9 hours ago   719 MB
localhost/service.net-mail.fetchmail        6.4.1            5d5f282a85c3  9 hours ago   182 MB
<none>                                      <none>           08d63fe8b739  9 hours ago   1.01 GB
localhost/service.net-mail.dovecot          2.3.10.1         5c41c96843ea  9 hours ago   138 MB
<none>                                      <none>           70a5ee2a9036  9 hours ago   978 MB
localhost/service.net-misc.dhcp             4.4.2-r2         475ba8782d27  9 hours ago   56.1 MB
<none>                                      <none>           4cf4c60453e9  9 hours ago   704 MB
localhost/service.net-im.bitlbee            3.6-r1           6fa778a1e295  9 hours ago   76.1 MB
<none>                                      <none>           27f231913e97  9 hours ago   724 MB
<none>                                      <none>           3147a70c2fa8  9 hours ago   201 MB
<none>                                      <none>           70b05ac3fdb0  9 hours ago   886 MB
localhost/gentoo-build                      latest           46af8815061c  10 hours ago  694 MB
<none>                                      <none>           91cea3d0b8a6  10 hours ago  1.72 GB
localhost/gentoo-base                       latest           d406dd590734  10 hours ago  1.72 GB
localhost/gentoo-init                       latest           29dabb5f9192  10 hours ago  1.03 GB
localhost/gentoo-stage3                     latest           59f2559dc157  10 hours ago  1.03 GB
localhost/gentoo-env                        latest           dbebd2d54773  10 hours ago  10.8 kB
<none>                                      <none>           98fb6df728c3  10 hours ago  10.8 kB
docker.io/gentoo/stage3-amd64               latest           0191ae9831e8  10 hours ago  1.03 GB
<none>                                      <none>           a250805d8924  10 hours ago  10.6 kB
<none>                                      <none>           cab41c9f6c91  11 hours ago  10.6 kB
<none>                                      <none>           ebd14945d3e2  12 hours ago  10.6 kB
<none>                                      <none>           24b79f09fae9  13 hours ago  1.1 kB
<none>                                      <none>           5f373ce1f9ca  13 hours ago  874 B
<none>                                      <none>           0150df1b09f1  13 hours ago  10.6 kB
<none>                                      <none>           83da2122740a  14 hours ago  1.03 GB
<none>                                      <none>           02250425189a  14 hours ago  10.8 kB
<none>                                      <none>           363faed68f89  14 hours ago  10.8 kB
<none>                                      <none>           d8a6a4cba834  14 hours ago  10.6 kB
<none>                                      <none>           8e122f6dc9bd  14 hours ago  10.8 kB
<none>                                      <none>           0f87042d6bb7  14 hours ago  10.8 kB
<none>                                      <none>           0060041cebcd  14 hours ago  10.6 kB
<none>                                      <none>           240ac0fff299  15 hours ago  198 MB
<none>                                      <none>           77823fd296c4  15 hours ago  865 MB
<none>                                      <none>           e004cc4b0756  17 hours ago  786 MB
<none>                                      <none>           1ca21f2cb5fe  17 hours ago  46.8 MB
<none>                                      <none>           524357a8f3a8  17 hours ago  79.8 MB
<none>                                      <none>           1337b7286e24  17 hours ago  90.9 MB
<none>                                      <none>           2e63567c74ca  18 hours ago  138 MB
<none>                                      <none>           b1069410b783  18 hours ago  63 MB
<none>                                      <none>           31b21360c44e  18 hours ago  199 MB
<none>                                      <none>           0a9903fa9ef1  18 hours ago  238 MB
<none>                                      <none>           143522eee7bd  18 hours ago  55 MB
<none>                                      <none>           fb729b3aebac  18 hours ago  96 MB
<none>                                      <none>           842b10624fa4  18 hours ago  98.7 MB
<none>                                      <none>           4efcc259fb20  18 hours ago  124 MB
<none>                                      <none>           883ec290aa37  18 hours ago  374 MB
<none>                                      <none>           00ad225a0e4b  18 hours ago  58.7 MB
<none>                                      <none>           fb9792785f75  18 hours ago  148 MB
<none>                                      <none>           f46fca3254f2  18 hours ago  65.6 MB
<none>                                      <none>           5c4f00b4ec12  19 hours ago  71.5 MB
<none>                                      <none>           7ceb0a66a95b  19 hours ago  182 MB
<none>                                      <none>           745569b72821  19 hours ago  138 MB
<none>                                      <none>           db9f9275d87f  19 hours ago  56.1 MB
<none>                                      <none>           8b864a5843dd  19 hours ago  76.1 MB
<none>                                      <none>           0d827ad68b94  19 hours ago  1.81 GB
<none>                                      <none>           2cb6a5d7307b  34 hours ago  1.03 GB
localhost/service.mail-filter.spampd        2.53             4228ac071a98  8 days ago    201 MB
localhost/service.mail-filter.spamassassin  3.4.4-r3         196842f7a495  8 days ago    201 MB
localhost/dell-ism                          351              88fd2190ca36  8 days ago    182 MB
localhost/dell-dsu                          20.06.00         d2c551fd55f0  8 days ago    422 MB
docker.io/library/ubuntu                    18.04            8e4ce0a6ce69  9 days ago    66.6 MB
docker.io/library/centos                    8                831691599b88  9 days ago    223 MB
docker.io/koalaman/shellcheck               stable           3d5a3cb1aa47  2 months ago  6.69 MB
docker.io/pulsesecure/vtm                   20.1             1232e940dcc2  3 months ago  808 MB

... many of which claim to be in use by a container when attempting to delete, even if no containers are running.

Sometimes these can be force-deleted, but often this results in podman then having stuck images for which it complains that manifests are missing - and the only solution I've found it to clear Podman's state and start again.

Steps to reproduce the issue:

  1. Construct container images (including a mixture of 'build' and 'run/commit' stages)

  2. podman image ls

  3. Observe the number of none/none images

Describe the results you received:

Many untagged temporary(?) images listed

Describe the results you expected:

Untagged images to automatically be removed (without using 'prune') - or, if pruned or deleted manually, this should be a safe operation (even if forced) and shouldn't result in stuck/unreadable images which cannot be further processed.

Additional information you deem important (e.g. issue happens only occasionally):

Numerous untagged images are generated every time images are constructed. Often these claim to be associated with a container even if none exist. Sometimes these become corrupted on (forced) deletion, and appear to require podman state erased in order to remove entirely.

So there are effectively three related issues:

  1. Untagged, presumably temporary images are kept after the successful completion of build commands. This may be intentional or designed to mirror docker behaviour, but also:

  2. Often, attempting to prune or manually delete these temporary images incorrectly results in a 'image is in use by container', even if podman ps -a shows no running containers;

  3. Force-deleting images apparently associated with a non-existent (or hidden?) container results in state corruption with podman reporting that the image manifest is missing or corrupt (with the overlay graph driver).

Output of podman version:

Version:      2.0.1
API Version:  1
Go Version:   go1.14.2
Git Commit:   a11c4ead10177a66ef2810a0a92ea8ce2299da07
Built:        Sat Jun 27 16:48:06 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.17, commit: 41877362fc4685d55e0473d2e4a1cbe5e1debee0'
  cpus: 8
  distribution:
    distribution: gentoo
    version: unknown
  eventLogger: file
  hostname: dellr330
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.4.38-gentoo
  linkmode: dynamic
  memFree: 3294429184
  memTotal: 8132182016
  ociRuntime:
    name: runc
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 25161351168
  swapTotal: 25769787392
  uptime: 534h 59m 34.07s (Approximately 22.25 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 1
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.ignore_chown_errors: "false"
  graphRoot: /space/podman/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 647
  runRoot: /space/podman/run
  volumePath: /space/podman/storage/volumes
version:
  APIVersion: 1
  Built: 1593276486
  BuiltTime: Sat Jun 27 16:48:06 2020
  GitCommit: a11c4ead10177a66ef2810a0a92ea8ce2299da07
  GoVersion: go1.14.2
  OsArch: linux/amd64
  Version: 2.0.1

Package info (e.g. output of rpm -q podman or apt list podman):

n/a

Additional environment details (AWS, VirtualBox, physical, etc.):

n/a

mheon commented 4 years ago

Are you building the images with Buildah, or Podman?

@TomSweeneyRedHat PTAL, this sounds concerning

srcshelton commented 4 years ago

Are you building the images with Buildah, or Podman?

All via podman, I don't have a separate buildah installation.

mheon commented 4 years ago

We do expect a fair number of none:none images to be generated after builds, because of layer caching; that's not a bug.

However, the inability to remove them because of leftover build containers definitely is. I would expect that a podman image prune would get rid of all of them without issue.

skorhone commented 4 years ago

@srcshelton Multistage builds? (Multiple froms in a single build)

I'd expect podman to cache layers created during multistage build. Some of layers created during multistage build are untagged top level layers. Those probably show up in podman images ls.

Podman (buildah) should prevent you from removing such layer while build is still in progress. Perhaps there is an issue with this logic (if it exists)

I won't go much further with my guessing 🙂

srcshelton commented 4 years ago

@srcshelton Multistage builds? (Multiple froms in a single build)

I'd expect podman to cache layers created during multistage build. Some of layers created during multistage build are untagged top level layers. Those probably show up in podman images ls.

Podman (buildah) should prevent you from removing such layer while build is still in progress. Perhaps there is an issue with this logic (if it exists)

I won't go much further with my guessing 🙂

A fair number of multi-stage builds, yes.

Something certainly appears to be leaving locks/references behind, though - it there a way I can confirm the linkages which podman believes to exist next time I get a stuck image?

Also, is the expected behaviour that running podman image prune should sometimes remove a single parent image (shown as untagged/''none') but leave behind single (presumably child) image (also showing as 'none') - and it takes several 'prune' runs to finally remove all hanging images... my assumption would be that 'prune' would recursively remove the stack of otherwise un-referenced child images up the the parent, rather than removing the parent only and so having to be run once per child.

I've actually seen up to five invocations of prune each remove a single image, and then the next run removing many further and finally leaving the image ls list clear of untagged images.

mheon commented 4 years ago

That image prune behavior sounds like a separate bug - I'd expect it would remove everything...

srcshelton commented 4 years ago

That image prune behavior sounds like a separate bug - I'd expect it would remove everything...

I suspect that it's related, though - it doesn't always behave like this, but the frequency of occurrence does seem related to 'stuck' images apparently linked to a non-existent container.

skorhone commented 4 years ago

Edit: Not sure, if there is anything wrong with prune.

I was able to simulate similar results when I created images using buildah.

I'm not sure if my observation is any way related to this considering @srcshelton isn't using buildah directly. But since podman is internally using buildah, I'd imagine that it's possible that if for any given reason build fails to delete container created during build, it might not be visible in podman.

I created few images with buildah and intentionally didn't clean containers that were created for building.

Results were similar to what @srcshelton observed. As you might expect, prune can't clean up images that have references:

podman system prune

WARNING! This will remove:
        - all stopped containers
        - all stopped pods
        - all dangling images
        - all build cache
Are you sure you want to continue? [y/N] y 
Deleted Pods
Deleted Containers
Deleted Images

ubuntu:~/Desktop$ podman images --filter dangling=true
REPOSITORY  TAG     IMAGE ID      CREATED        SIZE
<none>      <none>  04c6992f6f81  7 minutes ago  211 MB

ubuntu:~/Desktop$ podman images
REPOSITORY                             TAG           IMAGE ID      CREATED        SIZE
<none>                                 <none>        04c6992f6f81  7 minutes ago  211 MB

ubuntu:~/Desktop$ podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES

Logs show following:

WARN[0004] Failed to prune image 04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798 as it is in use: Image used by b6775790719903771182efbb53500a4117740a040cd818add2c0e325eba783ad: image is in use by a container

Again, as you might expect, with buildah, I can see that there are still containers:

ubuntu:~/Desktop$ buildah ps -a
CONTAINER ID  BUILDER  IMAGE ID     IMAGE NAME                       CONTAINER NAME
22b28fa451f0     *     b5b4d78bc90c docker.io/library/centos:7       centos-working-container
b358c8263309     *     04c6992f6f81                                  04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798-working-container
b67757907199     *     04c6992f6f81                                  04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798-working-container-1

Now if I remove those and, podman prune works properly:

ubuntu:~/Desktop$ buildah rm -a
22b28fa451f08522c846bc1008a7e90be7c830136d1c7c83029103d12196ee05
b358c826330920217c8fb5e9d772088b17000e8bd4597b9bdf5f501d3314daf8
b6775790719903771182efbb53500a4117740a040cd818add2c0e325eba783ad

ubuntu:~/Desktop$ podman system prune

WARNING! This will remove:
        - all stopped containers
        - all stopped pods
        - all dangling images
        - all build cache
Are you sure you want to continue? [y/N] y
Deleted Pods
Deleted Containers
Deleted Images
04c6992f6f811fb2b099d7fe6c82be816d44eacb4498a24fad9e8836ea9bb798

I think this a bit confusing from user perspective. Image shouldn't be flagged as dangling, if there's still a container referencing to it. Considering that podman is able to figure out that it isn't allowed to delete the image, it should also understand that image isn't dangling.

The fact that podman and buildah share image lists, but they don't share container view can be quite confusing from user perspective. Would it be possible somehow to mark which applications are referencing to images? This would allow you to produce a log entry that tells user why image can't be removed. Better yet, images list could be enhanced to include this information

skorhone commented 4 years ago

I was able to replicate this issue confirm my suspicions. Temporary containers created during build process are not visible to podman ps -a. If build process gets terminated, podman will leave behind containers that are only visible to buildah.

To reproduce issue:

  1. Create a long multistage build
  2. Run build with 'podman build'
  3. Abort build with ctrl+c
  4. Run 'podman ps -a'
  5. Run 'buildah ps -a'

Console output from my test:

ubuntu:~/git/boo$ buildah ps -a
CONTAINER ID  BUILDER  IMAGE ID     IMAGE NAME                       CONTAINER NAME
29f0f48ffa79     *     ef15416724f6 docker.io/library/golang:1.7.3   golang-working-container
8ee03bd2384b     *     e2625ac641ae                                  e2625ac641ae976720fafa5a105fdfedafde42fac4ef9d2f0ea4051747edf3a5-working-container
d4bcbb870ed3     *     b8fc3574bb4d                                  b8fc3574bb4d53a95cfeec5bf36dc71ac020d36ceac818301cd953235df6b7b3-working-container
a347c4b8880e     *     19b92005f722                                  19b92005f722cef83229c01fe6f8fece67c7b47bdb15f79fdbc3073a812aa9f8-working-container

ubuntu:~/git/boo$ podman ps -a
CONTAINER ID  IMAGE   COMMAND  CREATED  STATUS  PORTS   NAMES

Example Dockerfile:

FROM golang:1.7.3 AS zoology
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY foo .

FROM alpine:latest AS foobar
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=zoology /go/src/github.com/alexellis/href-counter/foo .

FROM foobar AS goner
CMD ["./app"]

FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /
COPY doo .
COPY --from=goner /root .
CMD ["./app"]

I think that you need to make these temporary containers created with podman build visible to podman.

srcshelton commented 4 years ago

In my build-scripts, I've added traps around any podman invocations which commit or remove images - these processes seem fragile, and interrupting podman during an image rm or image prune operation seems to have an increased likelihood of resulting in an anomalous state.

Since podman integrates buildah, I've not (to date) had buildah separately installed... can these 'invisible' buildah images be managed in any way via podman's other commands or directly on the filesystem, or is having the separate buildah install pretty much a pre-requisite (... and is that intentionally so?)

rhatdan commented 4 years ago

@TomSweeneyRedHat or @ashley-cui Could you look into this. I think we have a long term issue to manage buildah images from within Podman.

mheon commented 4 years ago

I'm honestly more concerned about the missing manifests problem that was mentioned initially - if we could get more details (error messages, a reproducer) there, it would be greatly appreciated. The lack of Buildah integration is something we've known about for a while and is on our list of things to fix, but not being able to fully delete images is very bad.

srcshelton commented 4 years ago

I'll post here as soon as I can reproduce... although the locking I've added does (coincidentally?) seem to have cut-down on the frequency of occurrence.

To summarise my thinking: podman should be interruptible, safely. If any action can't be safely interrupted, then signals should be ignored until the critical action is completed. Alternatively, changes should be performed by staging updates and then atomically committing them - so that if the process is killed during staging then it can subsequently be cleaned-up, and once the change is committed then further processing is again safe.

(This becomes more complicated where podman is invoking separate or third-party components... but assuming that these are fragile until proven not is not necessarily the worst of plans...)

skorhone commented 4 years ago

I spent an hour and a half trying to break system manually in a way that would not let me remove images with just podman. I simply couldn't get system in that kind of state. It could be that I'm using podman 2.0.0 - but I doubt that.

@srcshelton There isn't anything out of the ordinary with your system? I noticed that you are storing images in non-default location, is that local filesystem? Also, is your run root a tmpfs system? And finally, are you running podman in parallel?

srcshelton commented 4 years ago

I'm currently still on podman-2.0.0_rc6 awaiting the fix for ARG/ENV variables containing '=' (#6785), so it could have been fixed since. I saw the majority of errors, IIRC, whilst still on 2.0.0_rc5.

My setup is probably a little odd, yes - I'm migrating from an old 32-bit system image to a 64-bit one I'm building, with as many services containerised as possible. As such, the podman execution environment is actually a 64-bit chroot() from the 32-bit host system (all with a 64-bit kernel).

One notable anomaly about this setup, which I'm assuming is due to starting from a gaol, is that if I podman exec -it <container> /bin/sh then I actually end up back in the ultimate (32-bit) root of the system, outside of the chroot() environment started from and not within the container filesystem at all!

(... this does seem to make for a novel break-out solution though, faced with being 'root' in a chroot() gaol...)

All filesystems are local (although some elements which get mounted into containers are themselves NFS-mounted) and I'm not running multiple podman instances in parallel.

srcshelton commented 4 years ago

I've been unable to reproduce the missing manifest problem in the past few days, so perhaps the issue was resolved sometime between the various 2.0.0_rc releases?

I'm totally happy for this issue to be closed pending a recurrence, or left open a little longer in case I can get it to happen again...