Podman build very space inneficient

akostadinov commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I'm building an image which eds up with a 1.9GB layer. I have 7.4GB free space. Image fails to be committed. I think Podman should be a little more space efficient.

Steps to reproduce the issue:

df -h
$ podman build -f Dockerfile -t docker-registry.example.com/aosqe/cucushift:goc44 --squash .
df -h , this second df is to prove that it is only the new layer that caused the out of space issue

Describe the results you received:

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        28G   19G  7,4G  72% /

$ podman build -f Dockerfile -t docker-registry.example.com/aosqe/cucushift:goc44 --squash .
STEP 1: FROM docker-registry.example.com/aosqe/cucushift-base:latest
STEP 2: WORKDIR /root
STEP 3: USER root
....
STEP 6: USER 1001
STEP 7: COMMIT docker-registry.upshift.redhat.com/aosqe/cucushift:goc44
Getting image source signatures
Copying blob 22415211085f skipped: already exists  
Copying blob 063d4ba31922 skipped: already exists  
Copying blob 86e3f22a7b30 skipped: already exists  
Copying blob 2ed0e27ec24b skipped: already exists  
Copying blob e6d231d73c89 skipped: already exists  
Copying blob e302b25141a2 skipped: already exists  
Copying blob f8e8642e1692 skipped: already exists  
Copying blob a5426d886b2f skipped: already exists  
Copying blob acd1c28a11c7 done  
Copying config eca540b3a0 done  
Writing manifest to image destination
Storing signatures
Error: error committing container for step {Env:[__doozer=merge BUILD_RELEASE=2 BUILD_VERSION=v3.11.318 OS_GIT_MAJOR=3 OS_GIT_MINOR=11 OS_GIT_PATCH=318 OS_GIT_TREE_STATE=clean OS_GIT_VERSION=3.11.318-2 SOURCE_GIT_TREE_STATE=clean OS_GIT_COMMIT=f604696 SOURCE_DATE_EPOCH=1603370068 SOURCE_GIT_COMMIT=f6046962023c55f444112551bff93b9e380e646a SOURCE_GIT_TAG=f604696 SOURCE_GIT_URL=https://github.com/openshift/jenkins HOME=/home/jenkins KUBE_GIT_COMMIT=d4cacc043ac762235e16cb7361d527cb4189393c KUBE_GIT_MAJOR=1 KUBE_GIT_MINOR=11+ KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.11.0+fb8474f PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci USER_PATHS=/home/jenkins /etc/machine-id /etc/passwd /etc/pki] Command:user Args:[1001] Flags:[] Attrs:map[] Message:USER 1001 Original:USER 1001}: error copying layers and metadata for container "93cefb646b76c0929bb48f82353f7bdfd06659b7da3b39492233e9e914ee3f0b": Error committing the finished image: error adding layer with blob "sha256:acd1c28a11c7a6fa65960888312b995facacc0954126d2fc279b12193e185577": Error processing tar file(exit status 1): write /goproject/bin/ginkgo: no space left on device

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        28G   19G  7,4G  72% /

wrt layer size, I saw layer size while Copying blob acd1c28a11c7 done. But after operation is done, that is lost and I can't paste to you here.

Describe the results you expected: The image is built and committed without exhausting the whole space.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      2.2.0
API Version:  2.1.0
Go Version:   go1.14.10
Built:        Tue Dec  1 19:52:38 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.21-2.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21, commit: 81d18b6c3ffc266abdef7ca94c1450e669a6a388'
  cpus: 3
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: journald
  hostname: fedoraw
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 2000
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 10000
      size: 90000
  kernel: 5.9.11-100.fc32.x86_64
  linkmode: dynamic
  memFree: 14234861568
  memTotal: 16433328128
  ociRuntime:
    name: crun
    package: crun-0.16-1.fc32.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.16
      commit: eb0145e5ad4d8207e84a327248af76663d4e50dd
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.4-1.fc32.x86_64
    version: |-
      slirp4netns version 1.1.4
      commit: b66ffa8e262507e37fca689822d23430f3357fe8
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 2
  swapFree: 548794368
  swapTotal: 858779648
  uptime: 14h 3m 53.52s (Approximately 0.58 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/avalon/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.3.0-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.3
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/avalon/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 3
  runRoot: /run/user/1000/containers
  volumePath: /home/avalon/.local/share/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 1606845158
  BuiltTime: Tue Dec  1 19:52:38 2020
  GitCommit: ""
  GoVersion: go1.14.10
  OsArch: linux/amd64
  Version: 2.2.0

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.2.0-2.fc32.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

No

baude commented 3 years ago

please verify this with buildah and file an issue there. we simply consume their library.

mheon commented 3 years ago

Any chance you can provide the Dockerfile in question (or a reproducer that triggers the same effect)?

@TomSweeneyRedHat PTAL

rhatdan commented 3 years ago

@baude When this happens, it is easy to transfer the issue to buildah.

akostadinov commented 3 years ago

I can try to reproduce on monday. I updated my VM already to have more space so I can get some stuff built but I guess it should be easy to make less space available by creating garbage files.

akostadinov commented 3 years ago

Containerfile:

$ cat Dockerfile.space 
FROM quay.io/openshift/origin-jenkins-agent-base:4.6

RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
RUN echo just a layer > /root/layer

Result:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4,9G     0  4,9G   0% /dev
tmpfs           4,9G   84K  4,9G   1% /dev/shm
tmpfs           4,9G  1,2M  4,9G   1% /run
--> /dev/vda3        58G   48G  7,6G  87% / <--
tmpfs           4,9G   92K  4,9G   1% /tmp
/dev/vda1       477M  189M  259M  43% /boot
tmpfs           618M   36K  617M   1% /run/user/42
tmpfs           618M   12K  617M   1% /run/user/1000

$ podman build -f Dockerfile.space -t space-reproducer --squash .
STEP 1: FROM quay.io/openshift/origin-jenkins-agent-base:4.6
STEP 2: RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB, 2.9 GiB) copied, 5.28909 s, 595 MB/s
STEP 3: RUN echo just a layer > /root/layer
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
STEP 4: COMMIT space-reproducer
Getting image source signatures
Copying blob 226bfaae015f skipped: already exists  
Copying blob 70056249a0e2 skipped: already exists  
Copying blob 20852d862cb5 skipped: already exists  
Copying blob 2d0bb3fbd674 skipped: already exists  
Copying blob 59e0af54ebc8 skipped: already exists  
Copying blob 57f4895f9cac skipped: already exists  
Copying blob 682feb23c931 skipped: already exists  
Copying blob 62d02243ecc6 [====================>---------------] 1.7GiB / 2.9GiB
Error: error committing container for step {Env:[GODEBUG=x509ignoreCN=0 OPENSHIFT_CI=true PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci foo=bar HOME=/home/jenkins LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 OPENSHIFT_BUILD_NAME=jenkins-agent-base OPENSHIFT_BUILD_NAMESPACE=ci-op-nzkfdz86] Command:run Args:[echo just a layer > /root/layer] Flags:[] Attrs:map[] Message:RUN echo just a layer > /root/layer Original:RUN echo just a layer > /root/layer}: error copying layers and metadata for container "116d9db64014eaa8f52a7b82023b255078a378e00615dfa5b6c7abf7aab06f97": Error writing blob: error storing blob to file "/var/tmp/storage838120693/1": write /var/tmp/storage838120693/1: no space left on device

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/vda3        58G   48G  7,6G  87% /
...

$ podman build -f Dockerfile.space -t space-reproducer .
STEP 1: FROM quay.io/openshift/origin-jenkins-agent-base:4.6
STEP 2: RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB, 2.9 GiB) copied, 4.95324 s, 635 MB/s
Error: error committing container for step {Env:[GODEBUG=x509ignoreCN=0 OPENSHIFT_CI=true PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci foo=bar HOME=/home/jenkins LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 OPENSHIFT_BUILD_NAME=jenkins-agent-base OPENSHIFT_BUILD_NAMESPACE=ci-op-nzkfdz86] Command:run Args:[dd if=/dev/zero of=/root/space bs=1MiB count=3000] Flags:[] Attrs:map[] Message:RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000 Original:RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000}: error copying layers and metadata for container "11832feefde88de35a020ed7a0cc6f9532478df5b0f3e0324551fdedeaa53ebe": Error writing blob: error storing blob to file "/var/tmp/storage645036852/1": write /var/tmp/storage645036852/1: no space left on device

rhatdan commented 3 years ago

@nalind @mtrmac PTAL

rhatdan commented 3 years ago

@nalind How do we make buildah more space efficient?

mikepurvis commented 3 years ago

@akostadinov Have you got buildah set up to use fuse-overlayfs? If not, it will fall back to the VFS driver, which will do exactly this. On my system with only ubuntu:latest pulled, overlays use the expected size:

$ sudo du -hs /var/lib/containers/
79M     /var/lib/containers/

Whereas the VFS storage area uses almost five times that because it's simulating overlays using copying:

$ sudo du -hs .local/share/containers/
391M    .local/share/containers/

It's frustrating too because the layers on top of the main tarball in that image add almost nothing to it in terms of the size of content— and the Dockerfile already pays the readability cost having most of its actions happen in a single RUN:

https://github.com/tianon/docker-brew-ubuntu-core/blob/dist-amd64/focal/Dockerfile

It feels like as a matter of policy, low level base OS images on registries should be flattened to a single layer, but I guess it didn't really matter until the relatively recent rise of interest in rootless containering.

akostadinov commented 3 years ago

@mikepurvis , how do I setup buildah in that way? This is Fedora 32 and I already have fuse-overlayfs package installed.

rhatdan commented 3 years ago

If you do buildah info or podman info, what does it say about overlay driver.

akostadinov commented 3 years ago

$ buildah info
...
    "store": {
        "ContainerStore": {
            "number": 0
        },
        "GraphDriverName": "overlay",
        "GraphOptions": [
            "overlay.mount_program=/usr/bin/fuse-overlayfs"
        ],
        "GraphRoot": "/home/avalon/.local/share/containers/storage",
        "GraphStatus": {
            "Backing Filesystem": "extfs",
            "Native Overlay Diff": "false",
            "Supports d_type": "true",
            "Using metacopy": "false"
        },
        "ImageStore": {
            "number": 3
        },
        "RunRoot": "/run/user/1000/containers"
    }
...

rhatdan commented 3 years ago

@vrothberg Will you pull improvments help this? At least removing the blobs as they pull and are merged.

vrothberg commented 3 years ago

@vrothberg Will you pull improvments help this? At least removing the blobs as they pull and are merged.

Unfortunately not, no. The files are removed once the types.ImageDestination is closed. However, if add a reference counter (a given blob may be referenced more than once in an image), we can implement an early delete.

vrothberg commented 3 years ago

@rhatdan, I opened https://github.com/containers/image/issues/1187 and outlined what I think needs to be done to tackle the disk hunger.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

This is still a priority, but I don't believe anyone has worked on it yet.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

We have separate issues on the way we are copying files 4 times, where Docker is only copying them once.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

@flouthoc PTAL

flouthoc commented 3 years ago

I'll take a look. Thanks

mtrmac commented 3 years ago

@flouthoc Could you reproduce this and document where/what actually uses the disk space, please, so that we have a documented full picture of where we are, before starting to fix any specific suspected cause? We have justified suspicions about various code paths, but here might well be something else we haven’t thought about.

flouthoc commented 3 years ago

@mtrmac Sure i'll do that.

rhatdan commented 2 years ago

@flouthoc Any movement on this?

flouthoc commented 2 years ago

I was not able to look at this since last comment from @mtrmac will start experimenting on this from today.

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 2 years ago

@flouthoc reminder ping.

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

flouthoc commented 2 years ago

This is still pending. I'll try visiting this again.

igo95862 commented 2 years ago

Can Btrfs CoW capabilities be utilized to mitigate this issue? (when copying within same filesystem)

I was doing podman commit on a 90 GiB container.

There is a cp --reflink as an example of shallow copies with CoW but I don't think Golang has any interfaces to access same features. (it looks like it uses ioctl)

rhatdan commented 2 years ago

We are using reflink in containers/storage, but in this case, I think it might be the blobs that are using up all of the space and the blobs won't pre-exist on disk.

github-actions[bot] commented 1 year ago

A friendly reminder that this issue had no activity for 30 days.

containers / buildah

Podman build very space inneficient #2850