containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.41k stars 782 forks source link

Buildah images not so small? #532

Closed tdudgeon closed 6 years ago

tdudgeon commented 6 years ago

Description

One of the key points of buildah is that it allows you to build small images without lots of extra fluff like yum and python. What I'm finding is that the images buildah creates are bigger than the traditional docker images, even though they don't contain this extra fluff. What is happening here?

Steps to reproduce the issue:

This is all done on a new Centos7 cloud image with docker and buildah installed from RPMs.

First let's define our target.

$ docker pull centos:7
$ docker images | grep centos
docker.io/centos          7                   2d194b392dd1        2 weeks ago         195 MB

The Docker images is 195MB in size.

Now let's create a minimal image with only coreutils and bash packages added (the dockere image has both of these present). Here is the script I used:

#!/bin/bash

set -x

# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)

# install the packages
yum install --installroot $scratchmnt bash coreutils --releasever 7 --setopt install_weak_deps=false -y
yum clean all -y --installroot $scratchmnt --releasever 7

sudo buildah config --cmd /bin/bash $newcontainer

# set some config info
buildah config --label name=centos-base $newcontainer

# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer centos-base

Run this script:

$ sudo ./buildah-base.sh

Now let's look at the image that is built:

$ sudo buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
8379315d3e3e         docker.io/library/centos-base:latest                     Mar 25, 2018 17:08     212.1 MB

Hey! The image is 212MB in size, bigger than the Docker image. And looking into it confirms it does have yum or python installed. Why is it bigger, not smaller?

pixdrift commented 6 years ago

I don't think this is a direct comparison. If you track down the centos 7 base image, it looks like it's built using a base filesystem tarball rather than using yum to install the base system.

The centos 7 docker image on docker hub links to the dockerfile, the base filesystem tarball is in the repo: https://github.com/CentOS/sig-cloud-instance-images/blob/02904503939756f540cfaa3fbafbf280e8a11bef/docker/Dockerfile

There are likely other unnecessary files stripped out of the centos images.

rhatdan commented 6 years ago

@nalind @mtrmac Isn't this also an issue of compressed versus uncompressed?

tdudgeon commented 6 years ago

@pixdrift Yes, of course its not an exact comparison as they were built differently. But its not what I was expecting. The image built with buildah contains just the minimal scratch image (clocking in at a measly 1.77 KB for me) plus the bash and coreutils packages (which takes it up to 212 MB). The centos Dockerhub image contains those same bash and coreutils packages plus python and yum, and maybe other things too. And despite these extra things it comes in at a smaller size.

mtrmac commented 6 years ago

https://github.com/CentOS/sig-cloud-instance-build/blob/master/docker/centos-7.ks also does a few more cleanups.

rhatdan commented 6 years ago

@tdudgeon Could you check to see where the extra size is coming from?
On the buildah container do

du -sM /*

To show if there is any weird space being used.

rhatdan commented 6 years ago

Could this be the CLanguage bindings?

mtrmac commented 6 years ago

It might be helpful to start with determining whether the difference in size is due to the container contents, or due to the tooling.

Is the ratio between du -sc / and the size reported by docker images/buildah images roughly equal (i.e. the content is the difference), or significantly different (i.e. the tooling is the difference?)

pixdrift commented 6 years ago

Not a direct comparison, but another data point. Used the provided script to build from Oracle Linux 7 repo.

# buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
95cc7dba2f97         docker.io/library/buildah-ol7:latest                     Mar 26, 2018 22:28     177.4 MB

Then pushed to the docker-daemon using buildah push

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
buildah                       ol7                 95cc7dba2f97        4 minutes ago       182 MB

So there is a minor discrepancy in reported size, but in this case buildah is less than docker. Need to build in docker too for comparison.

pixdrift commented 6 years ago

I built the centos:7 container from the upstream Dockerfile using buildah bud:

# buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
102faebad41b         <none>                                                   Mar 26, 2018 23:15     194.5 MB

Pushed to docker:

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
centos                        7                   ea2fac082cce        3 minutes ago       195 MB

Looks like the tooling difference. The kickstart used to build the centos 7 filesystem image posted by @mtrmac shows the initial package selection is quite different.

pixdrift commented 6 years ago

@tdudgeon

Pulling apart the ks file posted above, and looking in the image, it looks like the documentation (~4MB) and locale-archive (~99MB) is what is causing the size issues. If you force your locale in the yum installer and specify nodocs, you will get a significantly smaller base image:

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
buildah                       stripped            4ddfc8034046        6 minutes ago       56.4 MB

Updated buildah script:

#!/bin/bash

set -x

# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)

# install the packages
yum install --installroot $scratchmnt bash coreutils --releasever 7 --setopt=install_weak_deps=false --setopt=tsflags=nodocs --setopt=override_install_langs=en_US.utf8 -y
yum clean all -y --installroot $scratchmnt --releasever 7

sudo buildah config --cmd /bin/bash $newcontainer

# set some config info
buildah config --label name=centos-base $newcontainer

# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer centos-base

Interested to know if this works ok from your CentOS source, and what kind of result you get with regard to size.

tdudgeon commented 6 years ago

@pixdrift Hey that makes a big difference. In my case it builds an image that is 91.56 MB in size. Much smaller than the original 212.1 MB (though still a fair bit bigger than your one of 56.4 MB).

pixdrift commented 6 years ago

@tdudgeon I suspect the difference in size is due to package dependency creep in the included packages. The build I was using may be from a 7.2/7.3 repo (was a random dev instance I had). Will have a closer look tomorrow and point it at 7.4. In total, I believe there were 20 rpms installed.

rhatdan commented 6 years ago

@tdudgeon Is Buildah now smaller the docker build?

rhatdan commented 6 years ago

@ipbabble Might be worth a blog on how to handle languages and make smaller images.

tdudgeon commented 6 years ago

@rhatdan Yes, the centos:7 image on Docker Hub is 195 MB whilst my latest equivalent with buildah is 91.56 MB. So just under half the size.

rhatdan commented 6 years ago

WooHoo. BTW Size is one important factor, but another factor customers look at is the number of packages/files inside of a container. They are looking to limit attack surface, with the theory that the fewer files/executables in the image the harder it is to exploit a container. So you might want to get a count of RPMs install

tdudgeon commented 6 years ago

The Docker Hub centos:7 image has 143 packages. The one built by buildah has 64 packages. But I had to yum install rpm so that I could count them, so that should really be 63.

mohammedzee1000 commented 6 years ago

You guys might also want to take a look at the atomic image https://github.com/CentOS/atomic-container.

It is built using microdnf and is already as small as 78 mb

pixdrift commented 6 years ago

@mohammedzee1000, Thanks for the suggestion, this image uses essentially the same package selection as I have outlined above with the os release, microdnf and systemd added.. then some manual cleanup of the resulting filesystem.

%packages --excludedocs --nobase --nocore --instLangs=en
bash
centos-release
microdnf
systemd

I am interested to know how my OL7 image ended up so much smaller (package count and size) than CentOS, I can only assume dependency changes.

@tdudgeon, can you post an RPM list from your container and I will put together a comparison? The yum installation log from buildah should be enough and won't require modifications to the image contents.

giuseppe commented 6 years ago

you can save some space removing the locales you don't need. This should be quite safe to do:

rm $scratchmnt/usr/lib/locale/locale-archive*
find $scratchmnt/usr/share/locale/ \! -name '*en*' -exec rm -rf \{\} \;
pixdrift commented 6 years ago

@giuseppe, this is redundant if you use the yum parameter I have provided above (--setopt=override_install_langs=en_US.utf8) because alternate locales aren't installed.

The locale-archive when specifying the language in yum is 1.1M instead of the default 100M, this is the primary change that saved the space for @tdudgeon

pixdrift commented 6 years ago

Resulting OL7 package list from my posted buildah script above (image size: 56.4MB)

(1/40): basesystem-10.0-7.0.1.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.0.1.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7.x86_64.rpm
(9/40): glibc-common-2.17-196.el7.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.0.1.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.0.1.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.0.1.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2017b-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm

The same when using RHEL 7.4 (image size: 57.08 MB):

(1/40): basesystem-10.0-7.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(9/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2018d-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm

Interested to know why CentOS 7 is larger using the same process.

ipbabble commented 6 years ago

Ack. On PTO but will take a look as soon as possible.

Been looking at this for RadAnalytics issues too.

William

On Tue, Mar 27, 2018 at 6:57 AM Daniel J Walsh notifications@github.com wrote:

@ipbabble https://github.com/ipbabble Might be worth a blog on how to handle languages and make smaller images.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/projectatomic/buildah/issues/532#issuecomment-376498323, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPpgaOUldcqatj1AlfrF1b8noF5bcBbks5tiikzgaJpZM4S6OS- .

-- Sent from Gmail Mobile on iPhone

pixdrift commented 6 years ago

Something strange is definitely happening. I just ran the same script as above on a fresh CentOS 7.4 build.. and I also got a 91.57MB result, which is 40MB bigger than either RHEL 7 or OL 7. The package list is the same number (40).. so something is odd. Going to go through and compare images now. Here is the package list from the CentOS 7.4 image.

(1/40): basesystem-10.0-7.el7.centos.noarch.rpm
(2/40): centos-release-7-4.1708.el7.centos.x86_64.rpm
(3/40): bash-4.2.46-29.el7_4.x86_64.rpm
(4/40): filesystem-3.2-21.el7.x86_64.rpm
(5/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(6/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(7/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(8/40): grep-2.20-3.el7.x86_64.rpm
(9/40): info-5.1-4.el7.x86_64.rpm
(10/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(11/40): gmp-6.0.0-15.el7.x86_64.rpm
(12/40): libacl-2.2.51-12.el7.x86_64.rpm
(13/40): libattr-2.4.46-12.el7.x86_64.rpm
(14/40): libcap-2.22-9.el7.x86_64.rpm
(15/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(16/40): libffi-3.0.13-18.el7.x86_64.rpm
(17/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(18/40): libselinux-2.5-11.el7.x86_64.rpm
(19/40): coreutils-8.22-18.el7.x86_64.rpm
(20/40): libsepol-2.5-6.el7.x86_64.rpm
(21/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(22/40): libverto-0.2.5-4.el7.x86_64.rpm
(23/40): libtasn1-4.10-1.el7.x86_64.rpm
(24/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(25/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(26/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(27/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(28/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(29/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(30/40): p11-kit-0.23.5-3.el7.x86_64.rpm 
(31/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(32/40): popt-1.13-16.el7.x86_64.rpm
(33/40): sed-4.2.2-5.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): setup-2.8.71-7.el7.noarch.rpm
(36/40): zlib-1.2.7-17.el7.x86_64.rpm
(37/40): tzdata-2018d-1.el7.noarch.rpm
(38/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(39/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(40/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm
pixdrift commented 6 years ago

Problem in CentOS was yum cache data not being cleaned up correctly. In my case it was epel repo. This could be solved by using '--disablerepo=epel' to the yum command, but people may want to install packages from here as part of the image creation.

I have an updated script here which uses rm to clean up the yum cache, and it brings the CentOS 7 image down below 57MB. https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh

TomSweeneyRedHat commented 6 years ago

@pixdrift @tdudgeon FYI, I just posted a little blog on this issue at http://www.projectatomic.io/blog/2018/04/open-source-what-a-concept/. Thanks a bunch for inspiring it and for your contributions here!

pixdrift commented 6 years ago

@TomSweeneyRedHat, thanks for posting the article. It should be noted that I identified two further things in this thread that are worth mentioning in the blog post.

  1. The ~40MB that is keeping the image at 92MB is the yum cache for epel. If the method of cache cleanup is changed to rm (rather than yum clean in the posted script), the image size reliably comes back at around 57MB for OL7, RHEL7 and CentOS 7.
  2. The option --setopt=install_weak_deps=false doesn't reduce the size of this image in my testing, so I have removed it to avoid unnecessary complexity.

Updated script is here: https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh

rhatdan commented 6 years ago

@pixdrift I think the --setop=install_weak_deps might have some effect on a Fedora system. I don't believe RHEL or CENTOS Support week dependencies.

pixdrift commented 6 years ago

Thanks @rhatdan, that helps explain it. I would expect a Fedora script to do the same (create a minimal base image) would be using dnf at this point in time, so the option may no longer be relevant there either.. unless the setopt option remained the same. I don't spend much time out of EL, but will take a look for interest's sake.

tdudgeon commented 6 years ago

Just for completeness, I tried rebuilding the base centos7 image but with the extra rm -rf $scratchmnt/var/cache/yum command suggested by @pixfrift to clean up the cache and the image size drops from 91.6 MB to 57.15 MB. Not bad seeing as we started at 212 MB!

Thanks all!

gbraad commented 6 years ago

Maybe an idea to use https://github.com/GoogleCloudPlatform/container-diff as it recently got support for RPM, but it can help with comparing containers even just on file level.

TomSweeneyRedHat commented 6 years ago

@pixdrift yep, noted the additional input from you. I didn't want to add it to the blog post as I've found a blog length of about 4 pages in a word processor software is about as long as you want. So I tried to show the initial breadcrumbs in the blog and then gave a couple of pointers and a tease to this issue here so they could dive even deeper. I do very much appreciate your contributions here though, it's been some really great work.

rhatdan commented 6 years ago

I might take a stab at a blog on this from a security point of view.

pixdrift commented 6 years ago

Would there be value capturing some of these buildah scripts for base OS container builds in contrib?

rhatdan commented 6 years ago

Sure maybe an examples directory.

TomSweeneyRedHat commented 6 years ago

@pixdrift I was thinking about that, didn't know if it made sense there, examples and/or tutorials. But I definitely wanted to save at least the final result somewhere after the dust settled.

pixdrift commented 6 years ago

The following may also be interesting to people following this thread: https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.ansible.minimal.sh

This example uses pip from the host (python2-pip) to install Python packages into the container, so the container doesn't need pip and its dependencies installed or any compilers installed to build source packages from pypi.

In this example the pip module is ansible (which could have just as easily been installed as an rpm), but it's really to demonstrate building containers to run Python code as it is a use case I see repeated in EL7 environments. In this case I am developing an apb-base style image using buildah to run Ansible playbooks.

The resulting image in this case which includes python + Ansible 2.5.0 and all required dependencies is around 150MB. Leaving pip outside the container looks to save around 10MB (depending on method).. and more if compilers etc. are required.

Still determining if there is any impact to the pip installation on the host, but looks good so far.

rhatdan commented 6 years ago

@pixdrift Want to write a blog describing this?

rhatdan commented 6 years ago

Blogs have been written explaining this.

tdudgeon commented 6 years ago

Just for the record I finally got round to writing this up as a blog post: https://www.informaticsmatters.com/blog/2018/05/31/smaller-containers-part-3.html @pixdrift @rhatdan @TomSweeneyRedHat and others - thanks for your help!

TomSweeneyRedHat commented 6 years ago

Excellent news @tdudgeon , thanks for sharing!

pixdrift commented 4 years ago

I should probably (finally) mention I did do a write up (months ago) that included the same process for EL8, with comparisons to the Red Hat UBI images. In case someone is stumbling across this thread and looking for additional content on the subject, I posted it here, with the README.md describing the outcomes:

https://gitlab.com/pixdrift/buildah-scripts