containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

podman-build requires more than 2x the size of image on TMPDIR on image commit #22342

Open rtgiskard opened 7 months ago

rtgiskard commented 7 months ago

Issue Description

I come across trouble on building a large container, after the final step it commits the image, and wait tens of seconds, it failed with this:

Error: committing container for step {...}: copying layers and metadata for container "be98acaa65c8247c6a98591f7fd2ffd25cf375cde25980c1da26604cc03d999a": initializing source containers-storage:7a71d2248bda-working-container: storing layer "961a14f2ade5ad0f19abe5b14bca5abb4bbda3e12cee5bc25a18512544f53db8" to file: io: read/write on closed pipe

While I'm trying to minimize and reproduce the isse, find that on image commit, there will be copies of different type in /var/tmp like this:

# du -sh  /var/tmp/[bcl]*
9.3G    /var/tmp/buildah2350563489
9.3G    /var/tmp/container_images_storage13471954
1.1G    /var/tmp/libpod_builder3381219266

The size of buildah or conatiner_images dir is likely to be the changed size (which is about 9G) of the single commit, which I think might be optimized as it takes too much.

While the biggest problem is that it use TMPDIR to copy images, why not in /var/lib/containers/, as it's general to have a small root and mount a large partition over /var/lib/containers.

Steps to reproduce the issue

  1. prepare Dockerfile to create a big image (maybe with dd) with respect to the free size of your root (or the TMPDIR)
  2. build the image, adjust the size, you may get io: read/write on closed pipe or no space left on device with respect to the free size of TMPDIR

Describe the results you received

Error with limited info

Describe the results you expected

  1. limit all files in containers root (/var/lib/containers)
  2. the error should more detailed info

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.10-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 2dcd736e46ded79a53339462bc251694b150f870'
  cpuUtilization:
    idlePercent: 96.93
    systemPercent: 0.66
    userPercent: 2.4
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: file
  freeLocks: 2046
  hostname: desk
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.8.4-arch1-1
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 4471574528
  memTotal: 24916529152
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: Unknown
    package: /usr/lib/podman/netavark is owned by netavark 1.10.3-1
    path: /usr/lib/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.14.4-1
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: /usr/bin/pasta is owned by passt 2024_04_05.954589b-1
    version: |
      pasta 2024_04_05.954589b
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.3-1
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 0
  swapTotal: 0
  uptime: 93h 50m 46.00s (Approximately 3.88 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 107374182400
  graphRootUsed: 71296245760
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 82
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /srv/virt/containers/storage/volumes
version:
  APIVersion: 5.0.1
  Built: 1712088128
  BuiltTime: Wed Apr  3 04:02:08 2024
  GitCommit: 946d055df324e4ed6c1e806b561af4740db4fea9-dirty
  GoVersion: go1.22.1
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

No response

rhatdan commented 7 months ago

If you set TMPDIR to /var/lib/containers/storage/tmp, it should follow those rules. Reason it requires 2x image size is it pulls the layer tar ball down, and then needs to untar it. Once it finishes untaring it should remove the tmp content.

rhatdan commented 7 months ago

@nalind @mtrmac @vrothberg @mheon @Luap99 @baude WDYT of changing TMPDIR to point at $GRAPHROOT/tmp by default?

baude commented 7 months ago

im wondering what other kinds of fun that brings in. seems like a reasonable ask though.

rhatdan commented 7 months ago

Seems like something we could hand in containers.conf

Luap99 commented 7 months ago

Seems like something we could hand in containers.conf

That already exists as image_copy_tmp_dir = "storage" AFAICT

mtrmac commented 7 months ago

TMPDIR is a system-global concept; directing that to a c-storage-specific location to store files which have nothing to do with c/storage would be very surprising. (Another reason I’m unhappy with Podman’s reinterpretation of this environment variable.)


Even thinking about images only, the system defaults have a major advantage that they are cleaned up automatically.

If we move the temporary location elsewhere, we will need a cleanup mechanism of some kind; and Podman’s installation will need to include steps to automatically enable/start that service (whether it runs on boot or periodically); and system administrators’ tools would probably have to learn about this location when trying to find where (as in this instance) the unexpected 9 GB went.

Not insurmountable , but I’m tempted to say that users who built space-constrained systems (or systems intentionally designed with no slack) are unavoidably taking on the responsibility to optimize the use of space far beyond what is reasonable for a typical system.

Just think about the ability of the system to apply package updates: That certainly requires some unknown number of free gigabytes in /var. A typical system is just going to have “enough” slack for all these purposes, without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…


The read/write on closed pipe error is, to a first approximation, an error handling bug, somewhere in buildah/image.go. I don’t see anything above making it certain that it caused by running out of disk space (although I can vaguely see a path where that probably could happen and result in this message).

Actually fixing this bug would very much benefit from actual steps to reproduce; which CLI options are used matters.

mtrmac commented 7 months ago

without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…

The point being that X + Y + Z + … >> “enough”; but if each of those location were a separate tightly-allocated partition, each of those partitions would need a slack which is, globally-speaking, not an effective use of space.

rtgiskard commented 7 months ago

The read/write on closed pipe error is, to a first approximation, an error handling bug, somewhere in buildah/image.go. I don’t see anything above making it certain that it caused by running out of disk space (although I can vaguely see a path where that probably could happen and result in this message).

It happens when try to reproduce the issue, adjust the size of dd outputs in Dockerfile, maybe related to the underlying btrfs with zstd compression (which makes the size not so predictable), with a proper size, I always get read/write on closed pipe.

While, when the image commit is large enough, it will be no space left on device which is very clear, and then I find it's related to the TMPDIR during the image commit operation. As before COMMIT there'll be no extra space consumption outside /var/lib/containers/ (and once the build failed, it generally gets cleared too), it's not clear to find the cause.

Once provide enough space, they’re all gone.

rtgiskard commented 7 months ago

without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…

The point being that X + Y + Z + … >> “enough”; but if each of those location were a separate tightly-allocated partition, each of those partitions would need a slack which is, globally-speaking, not an effective use of space.

Thanks for the details, get the decision, indeed reasonable, image_copy_tmp_dir is the workaround.

A proper emphasis somewhere for the storage requirement and/or build process might be useful :)

luckylinux commented 7 months ago

I can also confirm this happening on my Rock 5B SBC (aarch64).

I have all Podman Stuff on a separate ZFS Pool and Datasets (NVME Drive).

However, while building images, it was actually writing to /var/tmp, causing the SD card (:disappointed:) to fill up almost completly (99% full).

The message would be read/write on closed pipe also in my case.

I also discovered (semi-unrelated Issue) I had all the Kernel Sources for A LOT of different Kernel in /usr/src also on that SD Card :fearful:. Moving that now to a dedicated dataset on my zdata pool (mv /usr/src/* /zdata/SRC/, then chattr +i /usr/src and zfs set mountpoint=/usr/src zdata/SRC).

Changing the TMPDIR might be the final (although maybe NOT proper ?) solution here. In my podman User's /home/podman/.bash_profile (i.e. ~/.bash_profile) I set:

export TMPDIR="/home/podman/containers/tmp"

Then podman info seems to pick that up correctly since:

host:
  arch: arm64
  buildahVersion: 1.33.5
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.10+ds1-1_arm64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: unknown'
  cpuUtilization:
    idlePercent: 93.16
    systemPercent: 3.08
    userPercent: 3.76
  cpus: 8
  databaseBackend: boltdb
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  freeLocks: 2005
  hostname: Rock5B-01
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.6.19-1-arm64
  linkmode: dynamic
  logDriver: journald
  memFree: 368828416
  memTotal: 16477798400
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns_1.4.0-5_arm64
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.4.0
    package: netavark_1.4.0-3_arm64
    path: /usr/lib/podman/netavark
    version: netavark 1.4.0
  ociRuntime:
    name: crun
    package: crun_1.14.4-1_arm64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1002/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt_0.0~git20230309.7c7625d-1_arm64
    version: |
      pasta unknown version
      Copyright Red Hat
      GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_arm64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 0
  swapTotal: 0
  uptime: 318h 47m 58.00s (Approximately 13.25 days)
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.MYDOMAIN.TLD/docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/docker.io
      PullFromMirror: ""
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/docker.io/library
      PullFromMirror: ""
    Prefix: docker.io
    PullFromMirror: ""
  docker.MYDOMAIN.TLD:
    Blocked: false
    Insecure: false
    Location: docker.MYDOMAIN.TLD
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/docker.io
      PullFromMirror: ""
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/docker.io/library
      PullFromMirror: ""
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/ghcr.io
      PullFromMirror: ""
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/ghcr.io/library
      PullFromMirror: ""
    Prefix: docker.MYDOMAIN.TLD
    PullFromMirror: ""
  ghcr.io:
    Blocked: false
    Insecure: false
    Location: docker.MYDOMAIN.TLD/ghcr.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/ghcr.io
      PullFromMirror: ""
    - Insecure: false
      Location: docker.MYDOMAIN.TLD/ghcr.io/library
      PullFromMirror: ""
    Prefix: ghcr.io
    PullFromMirror: ""
  search:
  - docker.MYDOMAIN.TLD
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/podman/.config/containers/storage.conf
  containerStore:
    number: 12
    paused: 0
    running: 11
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs_1.13-1_arm64
      Version: |-
        fusermount3 version: 3.14.0
        fuse-overlayfs: version 1.13-dev
        FUSE library version 3.14.0
        using FUSE kernel interface version 7.31
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /home/podman/storage
  graphRootAllocated: 1899514036224
  graphRootUsed: 10135011328
  graphStatus:
    Backing Filesystem: zfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /home/podman/containers/tmp
  imageStore:
    number: 154
  runRoot: /run/user/1002/containers
  transientStore: false
  volumePath: /home/podman/storage/volumes
version:
  APIVersion: 4.9.3
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.21.6
  Os: linux
  OsArch: linux/arm64
  Version: 4.9.3
github-actions[bot] commented 6 months ago

A friendly reminder that this issue had no activity for 30 days.