containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.24k stars 766 forks source link

Question about make empty layer #2588

Closed GongT closed 3 years ago

GongT commented 3 years ago

Description I'm using buildah to create some images, but these images always two times larger than my expect.

Eg, a python project with tensorflow, it should about 1.6GB large, podman run xxx du -xhs / show it only contains ≈1.6GB file, but the image size is 4GB.
Everytime I update and push it, it really upload 4GB data on network.

Steps to reproduce the issue:

A small test script:

#!/usr/bin/env bash

set -Eeuo pipefail

# Create a example base image
CONTAINER=$(buildah from scratch)
MNT=$(buildah mount "$CONTAINER")
dd if=/dev/random of="$MNT/TEST_FILE" bs=1M count=1
buildah umount "$CONTAINER" > /dev/null
LAST_IMG=$(buildah commit --rm "$CONTAINER" my-test:base 2> /dev/null)
# Or use some existing
# LAST_IMG="alpine"

for i in $(seq 1 10); do
    echo "loop: $i, size: $(podman inspect --type=image --format '{{.Size}}' "$LAST_IMG")"

    CONTAINER=$(   buildah from "$LAST_IMG"    )
    LAST_IMG=$(    buildah commit --rm "$CONTAINER" my-test:$i     2>/dev/null)

done
echo "final: $(podman inspect --type=image --format '{{.Size}}' "$LAST_IMG")"

It create a test base image, with only one 1mb file.
then from and commit 10 times.

Describe the results you received:

loop: 1, size: 1051124
loop: 2, size: 2101684
loop: 3, size: 3152244
loop: 4, size: 3153710
loop: 5, size: 3155176
loop: 6, size: 3156644
loop: 7, size: 3158110
loop: 8, size: 3159576
loop: 9, size: 3161042
loop: 10, size: 3162508
final: 3163974
-----------------------------------------
REPOSITORY                TAG     IMAGE ID      CREATED         SIZE
localhost/my-test         10      a6aa71dffc25  1 second ago    3.16 MB
localhost/my-test         9       59b2acbd4d94  3 seconds ago   3.16 MB
localhost/my-test         8       359a38efb45b  5 seconds ago   3.16 MB
localhost/my-test         7       96199b17a63a  6 seconds ago   3.16 MB
localhost/my-test         6       f5cbdac78577  8 seconds ago   3.16 MB
localhost/my-test         5       ac5946399041  9 seconds ago   3.16 MB
localhost/my-test         4       5658695d23fc  10 seconds ago  3.16 MB
localhost/my-test         3       d3bdab904ce1  12 seconds ago  3.15 MB
localhost/my-test         2       0ba5cab3b936  13 seconds ago  3.15 MB
localhost/my-test         1       6c3679b7ac73  14 seconds ago  2.1 MB
localhost/my-test         base    e9a4949bd878  15 seconds ago  1.05 MB

Describe the results you expected:

podman images should list 11 (10 test + 1 base) image with ≈1MB

Output of rpm -q buildah or apt list buildah:

buildah-1.15.1-1.fc32.x86_64

Output of buildah version:

Version:         1.15.1
Go Version:      go1.14.6
Image Spec:      1.0.1-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        0.4.0
libcni Version:  
image Version:   5.5.1
Git Commit:      
Built:           Thu Jan  1 08:00:00 1970
OS/Arch:         linux/amd64

*Output of `cat /etc/release`:**

Fedora release 32 (Thirty Two)

Output of uname -a:

Linux developmentenvironment 5.7.15-100.fc31.x86_64 #1 SMP Tue Aug 11 17:18:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

# This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver
driver = "overlay"

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = false

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Size is used to set a maximum size of the container image.
# size = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"
GongT commented 3 years ago

Did I missed something about "empty layers" in the doc?

TomSweeneyRedHat commented 3 years ago

@GongT sorry for the late reply and I'm just catching up after being away for a few days. After a quick scan it looks like you're doing podman build and then 'buildah budand are finding differences in size between the two resulting images? The difference is probably due to the way that the--layersoption is defaulted to between the projects. For Podman--layers` is set to True by default, for Buildah, it is set to False by default. Does that explain the difference in your mind?

GongT commented 3 years ago

I did not use buildah bud or podman build, so there isn't a --layers option.
I'm doing buildah bud --layers and "manually create base filesystem by buildah commit".

I have tried the BUILDAH_LAYERS=true environment variable, no effect.

I found there is a "empty_layer" in history when I set --layers, how can I do the same thing when I use buildah config + buildah commit?

{
    "created": "2020-09-09T04:06:23.064359109Z",
    "created_by": "/bin/sh -c #(nop) ENV test=2",
    "empty_layer": true // <<--
}
TomSweeneyRedHat commented 3 years ago

@GongT sorry, I obviously skimmed this issue way too quickly in the first pass. I'm not sure why you're seeing the results that you are. I would expect them to have a portion of the first image size added to each successive image. In fact, that's what I see using Buildah 1.15.1 and Podman 2.0.

# ./run_tom.sh
0+1 records in
0+1 records out
116 bytes copied, 0.000715178 s, 162 kB/s
loop: 1, size: 3054
loop: 2, size: 5544
loop: 3, size: 8034
loop: 4, size: 9500
loop: 5, size: 10966
loop: 6, size: 12432
loop: 7, size: 13897
loop: 8, size: 15364
loop: 9, size: 16830
loop: 10, size: 18296
final: 19762

# podman version
Version:      2.0.4
API Version:  1
Go Version:   go1.13.14
Built:        Wed Dec 31 19:00:00 1969
OS/Arch:      linux/amd64

# buildah version
Version:         1.15.1
Go Version:      go1.13.14
Image Spec:      1.0.1-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        0.4.0
libcni Version:  
image Version:   5.5.1
Git Commit:      
Built:           Wed Dec 31 19:00:00 1969
OS/Arch:         linux/amd64

#  cat /etc/*release
Fedora release 31 (Thirty One)

Regardless, what I think you might be looking for is the --squash option on the commit command. If I change your commit line in the script to:

    LAST_IMG=$(    buildah commit --squash --rm "$CONTAINER" my-test:$i     2>/dev/null)

I see:

# ./run_tom.sh
0+1 records in
0+1 records out
116 bytes copied, 0.000663092 s, 175 kB/s
loop: 1, size: 3054
loop: 2, size: 3050
loop: 3, size: 3054
loop: 4, size: 3054
loop: 5, size: 3054
loop: 6, size: 3054
loop: 7, size: 3054
loop: 8, size: 3054
loop: 9, size: 3052
loop: 10, size: 3054
final: 3054

Does that fix things for you?

GongT commented 3 years ago

I tried --squash, but it did not fit my requirement.

I want my image to have 2 layers:

So I can save a lot bandwidth

podman build --squash is exactly what i want, but it not supported by buildah commit (nor buildah bud).
buildah commit's --squash option works like podman build --squash-all


I want to use buildah at first time is because this tutorial

dnf install --installroot $scratchmnt --releasever 30 bash coreutils --setopt install_weak_deps=false -y

This feature is really great. And this is why I must use buildah commit.

GongT commented 3 years ago

OMG. I tried to switch to podman build, but I found this issue again. 😭

I build a test image with:

FROM fedora
RUN touch /a

And I get:

localhost/test                       latest   f28d523f70dc  32 seconds ago  378 MB
registry.fedoraproject.org/fedora    latest   00ff39a8bf19  2 months ago    189 MB

Dump it podman save --format=oci-dir --output=. test:

total 361M
-rw-r--r--. 1 root root 181M Sep 10 13:14 82f4e266783e1953cc68ccced7c00bf23afe6f18684869115eee3bbe37827330
-rw-r--r--. 1 root root 181M Sep 10 13:14 a344333fd60e49df1a8bc922844c68562422199e225ad450172f4781b12e34f6
-rw-r--r--. 1 root root 1.2K Sep 10 13:14 f28d523f70dc0e619e3a16c11a81987a198c9f56ceb6193f746279412c3b78c9
-rw-r--r--. 1 root root  501 Sep 10 13:14 manifest.json
-rw-r--r--. 1 root root   33 Sep 10 13:13 version
GongT commented 3 years ago

Finally, after I switch from "overlay" to "btrfs" (+ mkfs.btrfs & mount), this issue disappear...