Closed tdudgeon closed 6 years ago
I don't think this is a direct comparison. If you track down the centos 7 base image, it looks like it's built using a base filesystem tarball rather than using yum to install the base system.
The centos 7 docker image on docker hub links to the dockerfile, the base filesystem tarball is in the repo: https://github.com/CentOS/sig-cloud-instance-images/blob/02904503939756f540cfaa3fbafbf280e8a11bef/docker/Dockerfile
There are likely other unnecessary files stripped out of the centos images.
@nalind @mtrmac Isn't this also an issue of compressed versus uncompressed?
@pixdrift Yes, of course its not an exact comparison as they were built differently. But its not what I was expecting. The image built with buildah contains just the minimal scratch image (clocking in at a measly 1.77 KB for me) plus the bash and coreutils packages (which takes it up to 212 MB). The centos Dockerhub image contains those same bash and coreutils packages plus python and yum, and maybe other things too. And despite these extra things it comes in at a smaller size.
https://github.com/CentOS/sig-cloud-instance-build/blob/master/docker/centos-7.ks also does a few more cleanups.
@tdudgeon Could you check to see where the extra size is coming from?
On the buildah container do
du -sM /*
To show if there is any weird space being used.
Could this be the CLanguage bindings?
It might be helpful to start with determining whether the difference in size is due to the container contents, or due to the tooling.
Is the ratio between du -sc /
and the size reported by docker images
/buildah images
roughly equal (i.e. the content is the difference), or significantly different (i.e. the tooling is the difference?)
Not a direct comparison, but another data point. Used the provided script to build from Oracle Linux 7 repo.
# buildah images
IMAGE ID IMAGE NAME CREATED AT SIZE
95cc7dba2f97 docker.io/library/buildah-ol7:latest Mar 26, 2018 22:28 177.4 MB
Then pushed to the docker-daemon using buildah push
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
buildah ol7 95cc7dba2f97 4 minutes ago 182 MB
So there is a minor discrepancy in reported size, but in this case buildah is less than docker. Need to build in docker too for comparison.
I built the centos:7 container from the upstream Dockerfile using buildah bud
:
# buildah images
IMAGE ID IMAGE NAME CREATED AT SIZE
102faebad41b <none> Mar 26, 2018 23:15 194.5 MB
Pushed to docker:
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
centos 7 ea2fac082cce 3 minutes ago 195 MB
Looks like the tooling difference. The kickstart used to build the centos 7 filesystem image posted by @mtrmac shows the initial package selection is quite different.
@tdudgeon
Pulling apart the ks file posted above, and looking in the image, it looks like the documentation (~4MB) and locale-archive (~99MB) is what is causing the size issues. If you force your locale in the yum installer and specify nodocs, you will get a significantly smaller base image:
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
buildah stripped 4ddfc8034046 6 minutes ago 56.4 MB
Updated buildah script:
#!/bin/bash
set -x
# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)
# install the packages
yum install --installroot $scratchmnt bash coreutils --releasever 7 --setopt=install_weak_deps=false --setopt=tsflags=nodocs --setopt=override_install_langs=en_US.utf8 -y
yum clean all -y --installroot $scratchmnt --releasever 7
sudo buildah config --cmd /bin/bash $newcontainer
# set some config info
buildah config --label name=centos-base $newcontainer
# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer centos-base
Interested to know if this works ok from your CentOS source, and what kind of result you get with regard to size.
@pixdrift Hey that makes a big difference. In my case it builds an image that is 91.56 MB in size. Much smaller than the original 212.1 MB (though still a fair bit bigger than your one of 56.4 MB).
@tdudgeon I suspect the difference in size is due to package dependency creep in the included packages. The build I was using may be from a 7.2/7.3 repo (was a random dev instance I had). Will have a closer look tomorrow and point it at 7.4. In total, I believe there were 20 rpms installed.
@tdudgeon Is Buildah now smaller the docker build?
@ipbabble Might be worth a blog on how to handle languages and make smaller images.
@rhatdan Yes, the centos:7 image on Docker Hub is 195 MB whilst my latest equivalent with buildah is 91.56 MB. So just under half the size.
WooHoo. BTW Size is one important factor, but another factor customers look at is the number of packages/files inside of a container. They are looking to limit attack surface, with the theory that the fewer files/executables in the image the harder it is to exploit a container. So you might want to get a count of RPMs install
The Docker Hub centos:7 image has 143 packages.
The one built by buildah has 64 packages.
But I had to yum install rpm
so that I could count them, so that should really be 63.
You guys might also want to take a look at the atomic image https://github.com/CentOS/atomic-container.
It is built using microdnf and is already as small as 78 mb
@mohammedzee1000, Thanks for the suggestion, this image uses essentially the same package selection as I have outlined above with the os release, microdnf and systemd added.. then some manual cleanup of the resulting filesystem.
%packages --excludedocs --nobase --nocore --instLangs=en
bash
centos-release
microdnf
systemd
I am interested to know how my OL7 image ended up so much smaller (package count and size) than CentOS, I can only assume dependency changes.
@tdudgeon, can you post an RPM list from your container and I will put together a comparison? The yum installation log from buildah should be enough and won't require modifications to the image contents.
you can save some space removing the locales you don't need. This should be quite safe to do:
rm $scratchmnt/usr/lib/locale/locale-archive*
find $scratchmnt/usr/share/locale/ \! -name '*en*' -exec rm -rf \{\} \;
@giuseppe, this is redundant if you use the yum parameter I have provided above (--setopt=override_install_langs=en_US.utf8
) because alternate locales aren't installed.
The locale-archive when specifying the language in yum is 1.1M instead of the default 100M, this is the primary change that saved the space for @tdudgeon
Resulting OL7 package list from my posted buildah script above (image size: 56.4MB)
(1/40): basesystem-10.0-7.0.1.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.0.1.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7.x86_64.rpm
(9/40): glibc-common-2.17-196.el7.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.0.1.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.0.1.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.0.1.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2017b-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm
The same when using RHEL 7.4 (image size: 57.08 MB):
(1/40): basesystem-10.0-7.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(9/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2018d-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm
Interested to know why CentOS 7 is larger using the same process.
Ack. On PTO but will take a look as soon as possible.
Been looking at this for RadAnalytics issues too.
William
On Tue, Mar 27, 2018 at 6:57 AM Daniel J Walsh notifications@github.com wrote:
@ipbabble https://github.com/ipbabble Might be worth a blog on how to handle languages and make smaller images.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/projectatomic/buildah/issues/532#issuecomment-376498323, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPpgaOUldcqatj1AlfrF1b8noF5bcBbks5tiikzgaJpZM4S6OS- .
-- Sent from Gmail Mobile on iPhone
Something strange is definitely happening. I just ran the same script as above on a fresh CentOS 7.4 build.. and I also got a 91.57MB result, which is 40MB bigger than either RHEL 7 or OL 7. The package list is the same number (40).. so something is odd. Going to go through and compare images now. Here is the package list from the CentOS 7.4 image.
(1/40): basesystem-10.0-7.el7.centos.noarch.rpm
(2/40): centos-release-7-4.1708.el7.centos.x86_64.rpm
(3/40): bash-4.2.46-29.el7_4.x86_64.rpm
(4/40): filesystem-3.2-21.el7.x86_64.rpm
(5/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(6/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(7/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(8/40): grep-2.20-3.el7.x86_64.rpm
(9/40): info-5.1-4.el7.x86_64.rpm
(10/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(11/40): gmp-6.0.0-15.el7.x86_64.rpm
(12/40): libacl-2.2.51-12.el7.x86_64.rpm
(13/40): libattr-2.4.46-12.el7.x86_64.rpm
(14/40): libcap-2.22-9.el7.x86_64.rpm
(15/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(16/40): libffi-3.0.13-18.el7.x86_64.rpm
(17/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(18/40): libselinux-2.5-11.el7.x86_64.rpm
(19/40): coreutils-8.22-18.el7.x86_64.rpm
(20/40): libsepol-2.5-6.el7.x86_64.rpm
(21/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(22/40): libverto-0.2.5-4.el7.x86_64.rpm
(23/40): libtasn1-4.10-1.el7.x86_64.rpm
(24/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(25/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(26/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(27/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(28/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(29/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(30/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(31/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(32/40): popt-1.13-16.el7.x86_64.rpm
(33/40): sed-4.2.2-5.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): setup-2.8.71-7.el7.noarch.rpm
(36/40): zlib-1.2.7-17.el7.x86_64.rpm
(37/40): tzdata-2018d-1.el7.noarch.rpm
(38/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(39/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(40/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm
Problem in CentOS was yum cache data not being cleaned up correctly. In my case it was epel repo. This could be solved by using '--disablerepo=epel' to the yum command, but people may want to install packages from here as part of the image creation.
I have an updated script here which uses rm to clean up the yum cache, and it brings the CentOS 7 image down below 57MB. https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh
@pixdrift @tdudgeon FYI, I just posted a little blog on this issue at http://www.projectatomic.io/blog/2018/04/open-source-what-a-concept/. Thanks a bunch for inspiring it and for your contributions here!
@TomSweeneyRedHat, thanks for posting the article. It should be noted that I identified two further things in this thread that are worth mentioning in the blog post.
--setopt=install_weak_deps=false
doesn't reduce the size of this image in my testing, so I have removed it to avoid unnecessary complexity.Updated script is here: https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh
@pixdrift I think the --setop=install_weak_deps might have some effect on a Fedora system. I don't believe RHEL or CENTOS Support week dependencies.
Thanks @rhatdan, that helps explain it. I would expect a Fedora script to do the same (create a minimal base image) would be using dnf
at this point in time, so the option may no longer be relevant there either.. unless the setopt
option remained the same. I don't spend much time out of EL, but will take a look for interest's sake.
Just for completeness, I tried rebuilding the base centos7 image but with the extra rm -rf $scratchmnt/var/cache/yum
command suggested by @pixfrift to clean up the cache and the image size drops from 91.6 MB to 57.15 MB. Not bad seeing as we started at 212 MB!
Thanks all!
Maybe an idea to use https://github.com/GoogleCloudPlatform/container-diff as it recently got support for RPM, but it can help with comparing containers even just on file level.
@pixdrift yep, noted the additional input from you. I didn't want to add it to the blog post as I've found a blog length of about 4 pages in a word processor software is about as long as you want. So I tried to show the initial breadcrumbs in the blog and then gave a couple of pointers and a tease to this issue here so they could dive even deeper. I do very much appreciate your contributions here though, it's been some really great work.
I might take a stab at a blog on this from a security point of view.
Would there be value capturing some of these buildah scripts for base OS container builds in contrib
?
Sure maybe an examples directory.
@pixdrift I was thinking about that, didn't know if it made sense there, examples and/or tutorials. But I definitely wanted to save at least the final result somewhere after the dust settled.
The following may also be interesting to people following this thread: https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.ansible.minimal.sh
This example uses pip
from the host (python2-pip) to install Python packages into the container, so the container doesn't need pip
and its dependencies installed or any compilers installed to build source packages from pypi.
In this example the pip module is ansible (which could have just as easily been installed as an rpm), but it's really to demonstrate building containers to run Python code as it is a use case I see repeated in EL7 environments. In this case I am developing an apb-base style image using buildah to run Ansible playbooks.
The resulting image in this case which includes python + Ansible 2.5.0 and all required dependencies is around 150MB. Leaving pip outside the container looks to save around 10MB (depending on method).. and more if compilers etc. are required.
Still determining if there is any impact to the pip installation on the host, but looks good so far.
@pixdrift Want to write a blog describing this?
Blogs have been written explaining this.
Just for the record I finally got round to writing this up as a blog post: https://www.informaticsmatters.com/blog/2018/05/31/smaller-containers-part-3.html @pixdrift @rhatdan @TomSweeneyRedHat and others - thanks for your help!
Excellent news @tdudgeon , thanks for sharing!
I should probably (finally) mention I did do a write up (months ago) that included the same process for EL8, with comparisons to the Red Hat UBI images. In case someone is stumbling across this thread and looking for additional content on the subject, I posted it here, with the README.md describing the outcomes:
Description
One of the key points of buildah is that it allows you to build small images without lots of extra fluff like yum and python. What I'm finding is that the images buildah creates are bigger than the traditional docker images, even though they don't contain this extra fluff. What is happening here?
Steps to reproduce the issue:
This is all done on a new Centos7 cloud image with docker and buildah installed from RPMs.
First let's define our target.
The Docker images is 195MB in size.
Now let's create a minimal image with only coreutils and bash packages added (the dockere image has both of these present). Here is the script I used:
Run this script:
Now let's look at the image that is built:
Hey! The image is 212MB in size, bigger than the Docker image. And looking into it confirms it does have yum or python installed. Why is it bigger, not smaller?