Open rtgiskard opened 7 months ago
If you set TMPDIR to /var/lib/containers/storage/tmp, it should follow those rules. Reason it requires 2x image size is it pulls the layer tar ball down, and then needs to untar it. Once it finishes untaring it should remove the tmp content.
@nalind @mtrmac @vrothberg @mheon @Luap99 @baude WDYT of changing TMPDIR to point at $GRAPHROOT/tmp by default?
im wondering what other kinds of fun that brings in. seems like a reasonable ask though.
Seems like something we could hand in containers.conf
Seems like something we could hand in containers.conf
That already exists as image_copy_tmp_dir = "storage"
AFAICT
TMPDIR
is a system-global concept; directing that to a c-storage-specific location to store files which have nothing to do with c/storage would be very surprising. (Another reason I’m unhappy with Podman’s reinterpretation of this environment variable.)
Even thinking about images only, the system defaults have a major advantage that they are cleaned up automatically.
If we move the temporary location elsewhere, we will need a cleanup mechanism of some kind; and Podman’s installation will need to include steps to automatically enable/start that service (whether it runs on boot or periodically); and system administrators’ tools would probably have to learn about this location when trying to find where (as in this instance) the unexpected 9 GB went.
Not insurmountable , but I’m tempted to say that users who built space-constrained systems (or systems intentionally designed with no slack) are unavoidably taking on the responsibility to optimize the use of space far beyond what is reasonable for a typical system.
Just think about the ability of the system to apply package updates: That certainly requires some unknown number of free gigabytes in /var
. A typical system is just going to have “enough” slack for all these purposes, without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…
The read/write on closed pipe
error is, to a first approximation, an error handling bug, somewhere in buildah/image.go
. I don’t see anything above making it certain that it caused by running out of disk space (although I can vaguely see a path where that probably could happen and result in this message).
Actually fixing this bug would very much benefit from actual steps to reproduce; which CLI options are used matters.
without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…
The point being that X + Y + Z + … >> “enough”; but if each of those location were a separate tightly-allocated partition, each of those partitions would need a slack which is, globally-speaking, not an effective use of space.
The
read/write on closed pipe
error is, to a first approximation, an error handling bug, somewhere inbuildah/image.go
. I don’t see anything above making it certain that it caused by running out of disk space (although I can vaguely see a path where that probably could happen and result in this message).
It happens when try to reproduce the issue, adjust the size of dd outputs in Dockerfile, maybe related to the underlying btrfs with zstd compression (which makes the size not so predictable), with a proper size, I always get read/write on closed pipe
.
While, when the image commit is large enough, it will be no space left on device
which is very clear, and then I find it's related to the TMPDIR
during the image commit operation. As before COMMIT there'll be no extra space consumption outside /var/lib/containers/
(and once the build failed, it generally gets cleared too), it's not clear to find the cause.
Once provide enough space, they’re all gone.
without anyone having to allocate X GB for DNF, Y GB for Podman, Z DB for a database migration…
The point being that X + Y + Z + … >> “enough”; but if each of those location were a separate tightly-allocated partition, each of those partitions would need a slack which is, globally-speaking, not an effective use of space.
Thanks for the details, get the decision, indeed reasonable, image_copy_tmp_dir
is the workaround.
A proper emphasis somewhere for the storage requirement and/or build process might be useful :)
I can also confirm this happening on my Rock 5B SBC (aarch64).
I have all Podman Stuff on a separate ZFS Pool and Datasets (NVME Drive).
However, while building images, it was actually writing to /var/tmp
, causing the SD card (:disappointed:) to fill up almost completly (99% full).
The message would be read/write on closed pipe
also in my case.
I also discovered (semi-unrelated Issue) I had all the Kernel Sources for A LOT of different Kernel in /usr/src
also on that SD Card :fearful:. Moving that now to a dedicated dataset on my zdata
pool (mv /usr/src/* /zdata/SRC/
, then chattr +i /usr/src
and zfs set mountpoint=/usr/src zdata/SRC
).
Changing the TMPDIR
might be the final (although maybe NOT proper ?) solution here.
In my podman
User's /home/podman/.bash_profile
(i.e. ~/.bash_profile
) I set:
export TMPDIR="/home/podman/containers/tmp"
Then podman info
seems to pick that up correctly since:
host:
arch: arm64
buildahVersion: 1.33.5
cgroupControllers:
- cpu
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon_2.1.10+ds1-1_arm64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: unknown'
cpuUtilization:
idlePercent: 93.16
systemPercent: 3.08
userPercent: 3.76
cpus: 8
databaseBackend: boltdb
distribution:
codename: bookworm
distribution: debian
version: "12"
eventLogger: journald
freeLocks: 2005
hostname: Rock5B-01
idMappings:
gidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 6.6.19-1-arm64
linkmode: dynamic
logDriver: journald
memFree: 368828416
memTotal: 16477798400
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns_1.4.0-5_arm64
path: /usr/lib/podman/aardvark-dns
version: aardvark-dns 1.4.0
package: netavark_1.4.0-3_arm64
path: /usr/lib/podman/netavark
version: netavark 1.4.0
ociRuntime:
name: crun
package: crun_1.14.4-1_arm64
path: /usr/bin/crun
version: |-
crun version 1.14.4
commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
rundir: /run/user/1002/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt_0.0~git20230309.7c7625d-1_arm64
version: |
pasta unknown version
Copyright Red Hat
GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/user/1002/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns_1.2.0-1_arm64
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4
swapFree: 0
swapTotal: 0
uptime: 318h 47m 58.00s (Approximately 13.25 days)
variant: v8
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
docker.io:
Blocked: false
Insecure: false
Location: docker.MYDOMAIN.TLD/docker.io
MirrorByDigestOnly: false
Mirrors:
- Insecure: false
Location: docker.MYDOMAIN.TLD/docker.io
PullFromMirror: ""
- Insecure: false
Location: docker.MYDOMAIN.TLD/docker.io/library
PullFromMirror: ""
Prefix: docker.io
PullFromMirror: ""
docker.MYDOMAIN.TLD:
Blocked: false
Insecure: false
Location: docker.MYDOMAIN.TLD
MirrorByDigestOnly: false
Mirrors:
- Insecure: false
Location: docker.MYDOMAIN.TLD/docker.io
PullFromMirror: ""
- Insecure: false
Location: docker.MYDOMAIN.TLD/docker.io/library
PullFromMirror: ""
- Insecure: false
Location: docker.MYDOMAIN.TLD/ghcr.io
PullFromMirror: ""
- Insecure: false
Location: docker.MYDOMAIN.TLD/ghcr.io/library
PullFromMirror: ""
Prefix: docker.MYDOMAIN.TLD
PullFromMirror: ""
ghcr.io:
Blocked: false
Insecure: false
Location: docker.MYDOMAIN.TLD/ghcr.io
MirrorByDigestOnly: false
Mirrors:
- Insecure: false
Location: docker.MYDOMAIN.TLD/ghcr.io
PullFromMirror: ""
- Insecure: false
Location: docker.MYDOMAIN.TLD/ghcr.io/library
PullFromMirror: ""
Prefix: ghcr.io
PullFromMirror: ""
search:
- docker.MYDOMAIN.TLD
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /home/podman/.config/containers/storage.conf
containerStore:
number: 12
paused: 0
running: 11
stopped: 1
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs_1.13-1_arm64
Version: |-
fusermount3 version: 3.14.0
fuse-overlayfs: version 1.13-dev
FUSE library version 3.14.0
using FUSE kernel interface version 7.31
overlay.mountopt: nodev,metacopy=on
graphRoot: /home/podman/storage
graphRootAllocated: 1899514036224
graphRootUsed: 10135011328
graphStatus:
Backing Filesystem: zfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "true"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /home/podman/containers/tmp
imageStore:
number: 154
runRoot: /run/user/1002/containers
transientStore: false
volumePath: /home/podman/storage/volumes
version:
APIVersion: 4.9.3
Built: 0
BuiltTime: Thu Jan 1 00:00:00 1970
GitCommit: ""
GoVersion: go1.21.6
Os: linux
OsArch: linux/arm64
Version: 4.9.3
A friendly reminder that this issue had no activity for 30 days.
Issue Description
I come across trouble on building a large container, after the final step it commits the image, and wait tens of seconds, it failed with this:
While I'm trying to minimize and reproduce the isse, find that on image commit, there will be copies of different type in /var/tmp like this:
The size of buildah or conatiner_images dir is likely to be the changed size (which is about 9G) of the single commit, which I think might be optimized as it takes too much.
While the biggest problem is that it use
TMPDIR
to copy images, why not in/var/lib/containers/
, as it's general to have a small root and mount a large partition over/var/lib/containers
.Steps to reproduce the issue
io: read/write on closed pipe
orno space left on device
with respect to the free size of TMPDIRDescribe the results you received
Error with limited info
Describe the results you expected
podman info output
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response