Open akostadinov opened 3 years ago
please verify this with buildah and file an issue there. we simply consume their library.
Any chance you can provide the Dockerfile in question (or a reproducer that triggers the same effect)?
@TomSweeneyRedHat PTAL
@baude When this happens, it is easy to transfer the issue to buildah.
I can try to reproduce on monday. I updated my VM already to have more space so I can get some stuff built but I guess it should be easy to make less space available by creating garbage files.
Containerfile:
$ cat Dockerfile.space
FROM quay.io/openshift/origin-jenkins-agent-base:4.6
RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
RUN echo just a layer > /root/layer
Result:
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4,9G 0 4,9G 0% /dev
tmpfs 4,9G 84K 4,9G 1% /dev/shm
tmpfs 4,9G 1,2M 4,9G 1% /run
--> /dev/vda3 58G 48G 7,6G 87% / <--
tmpfs 4,9G 92K 4,9G 1% /tmp
/dev/vda1 477M 189M 259M 43% /boot
tmpfs 618M 36K 617M 1% /run/user/42
tmpfs 618M 12K 617M 1% /run/user/1000
$ podman build -f Dockerfile.space -t space-reproducer --squash .
STEP 1: FROM quay.io/openshift/origin-jenkins-agent-base:4.6
STEP 2: RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB, 2.9 GiB) copied, 5.28909 s, 595 MB/s
STEP 3: RUN echo just a layer > /root/layer
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
STEP 4: COMMIT space-reproducer
Getting image source signatures
Copying blob 226bfaae015f skipped: already exists
Copying blob 70056249a0e2 skipped: already exists
Copying blob 20852d862cb5 skipped: already exists
Copying blob 2d0bb3fbd674 skipped: already exists
Copying blob 59e0af54ebc8 skipped: already exists
Copying blob 57f4895f9cac skipped: already exists
Copying blob 682feb23c931 skipped: already exists
Copying blob 62d02243ecc6 [====================>---------------] 1.7GiB / 2.9GiB
Error: error committing container for step {Env:[GODEBUG=x509ignoreCN=0 OPENSHIFT_CI=true PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci foo=bar HOME=/home/jenkins LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 OPENSHIFT_BUILD_NAME=jenkins-agent-base OPENSHIFT_BUILD_NAMESPACE=ci-op-nzkfdz86] Command:run Args:[echo just a layer > /root/layer] Flags:[] Attrs:map[] Message:RUN echo just a layer > /root/layer Original:RUN echo just a layer > /root/layer}: error copying layers and metadata for container "116d9db64014eaa8f52a7b82023b255078a378e00615dfa5b6c7abf7aab06f97": Error writing blob: error storing blob to file "/var/tmp/storage838120693/1": write /var/tmp/storage838120693/1: no space left on device
$ df -h
Filesystem Size Used Avail Use% Mounted on
...
/dev/vda3 58G 48G 7,6G 87% /
...
$ podman build -f Dockerfile.space -t space-reproducer .
STEP 1: FROM quay.io/openshift/origin-jenkins-agent-base:4.6
STEP 2: RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3000+0 records in
3000+0 records out
3145728000 bytes (3.1 GB, 2.9 GiB) copied, 4.95324 s, 635 MB/s
Error: error committing container for step {Env:[GODEBUG=x509ignoreCN=0 OPENSHIFT_CI=true PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin container=oci foo=bar HOME=/home/jenkins LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 OPENSHIFT_BUILD_NAME=jenkins-agent-base OPENSHIFT_BUILD_NAMESPACE=ci-op-nzkfdz86] Command:run Args:[dd if=/dev/zero of=/root/space bs=1MiB count=3000] Flags:[] Attrs:map[] Message:RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000 Original:RUN dd if=/dev/zero of=/root/space bs=1MiB count=3000}: error copying layers and metadata for container "11832feefde88de35a020ed7a0cc6f9532478df5b0f3e0324551fdedeaa53ebe": Error writing blob: error storing blob to file "/var/tmp/storage645036852/1": write /var/tmp/storage645036852/1: no space left on device
@nalind @mtrmac PTAL
@nalind How do we make buildah more space efficient?
@akostadinov Have you got buildah set up to use fuse-overlayfs? If not, it will fall back to the VFS driver, which will do exactly this. On my system with only ubuntu:latest
pulled, overlays use the expected size:
$ sudo du -hs /var/lib/containers/
79M /var/lib/containers/
Whereas the VFS storage area uses almost five times that because it's simulating overlays using copying:
$ sudo du -hs .local/share/containers/
391M .local/share/containers/
It's frustrating too because the layers on top of the main tarball in that image add almost nothing to it in terms of the size of content— and the Dockerfile already pays the readability cost having most of its actions happen in a single RUN
:
https://github.com/tianon/docker-brew-ubuntu-core/blob/dist-amd64/focal/Dockerfile
It feels like as a matter of policy, low level base OS images on registries should be flattened to a single layer, but I guess it didn't really matter until the relatively recent rise of interest in rootless containering.
@mikepurvis , how do I setup buildah in that way? This is Fedora 32 and I already have fuse-overlayfs package installed.
If you do buildah info or podman info, what does it say about overlay driver.
$ buildah info
...
"store": {
"ContainerStore": {
"number": 0
},
"GraphDriverName": "overlay",
"GraphOptions": [
"overlay.mount_program=/usr/bin/fuse-overlayfs"
],
"GraphRoot": "/home/avalon/.local/share/containers/storage",
"GraphStatus": {
"Backing Filesystem": "extfs",
"Native Overlay Diff": "false",
"Supports d_type": "true",
"Using metacopy": "false"
},
"ImageStore": {
"number": 3
},
"RunRoot": "/run/user/1000/containers"
}
...
@vrothberg Will you pull improvments help this? At least removing the blobs as they pull and are merged.
@vrothberg Will you pull improvments help this? At least removing the blobs as they pull and are merged.
Unfortunately not, no. The files are removed once the types.ImageDestination
is closed. However, if add a reference counter (a given blob may be referenced more than once in an image), we can implement an early delete.
@rhatdan, I opened https://github.com/containers/image/issues/1187 and outlined what I think needs to be done to tackle the disk hunger.
A friendly reminder that this issue had no activity for 30 days.
This is still a priority, but I don't believe anyone has worked on it yet.
A friendly reminder that this issue had no activity for 30 days.
We have separate issues on the way we are copying files 4 times, where Docker is only copying them once.
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.
@flouthoc PTAL
I'll take a look. Thanks
@flouthoc Could you reproduce this and document where/what actually uses the disk space, please, so that we have a documented full picture of where we are, before starting to fix any specific suspected cause? We have justified suspicions about various code paths, but here might well be something else we haven’t thought about.
@mtrmac Sure i'll do that.
@flouthoc Any movement on this?
I was not able to look at this since last comment from @mtrmac will start experimenting on this from today.
A friendly reminder that this issue had no activity for 30 days.
@flouthoc reminder ping.
A friendly reminder that this issue had no activity for 30 days.
This is still pending. I'll try visiting this again.
Can Btrfs CoW capabilities be utilized to mitigate this issue? (when copying within same filesystem)
I was doing podman commit
on a 90 GiB container.
There is a cp --reflink
as an example of shallow copies with CoW but I don't think Golang has any interfaces to access same features. (it looks like it uses ioctl)
We are using reflink in containers/storage, but in this case, I think it might be the blobs that are using up all of the space and the blobs won't pre-exist on disk.
A friendly reminder that this issue had no activity for 30 days.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I'm building an image which eds up with a 1.9GB layer. I have 7.4GB free space. Image fails to be committed. I think Podman should be a little more space efficient.
Steps to reproduce the issue:
df -h
$ podman build -f Dockerfile -t docker-registry.example.com/aosqe/cucushift:goc44 --squash .
df -h
, this seconddf
is to prove that it is only the new layer that caused the out of space issueDescribe the results you received:
wrt layer size, I saw layer size while
Copying blob acd1c28a11c7 done
. But after operation is done, that is lost and I can't paste to you here.Describe the results you expected: The image is built and committed without exhausting the whole space.
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
No