gentoo / gentoo-docker-images

[MIRROR] Common effort to get an official and automated gentoo base docker container
https://gitweb.gentoo.org/proj/docker-images.git
GNU General Public License v2.0
322 stars 89 forks source link

Reduce size of images by shipping a slim image without "unneeded" files #143

Open eli-schwartz opened 2 months ago

eli-schwartz commented 2 months ago
$ docker run --rm -it gentoo/stage3 du -sh /
1.2G    /

It should be possible to slim this up a bit.

36M /usr/share/doc
3.2M    /usr/share/gtk-doc
3.7M    /usr/share/info
65M /usr/share/man
71M /usr/share/locale

We could use INSTALL_MASK for this when creating the initial image, let's call it ":slim" for now, then restore it in a second layer and create a new image for the exceptional case where someone actually wants this info in a docker container.

eli-schwartz commented 2 months ago

Another thought:

$ emerge-webrsync && emerge -c --with-bdeps=n && rm -rf /var/db/repos/gentoo
$ du -sh /
924M    /

This gets rid of some things people are likely to want, such as autoconf/automake/libtool, and pkgconf, and also perl!.

It also gets rid of some things people are quite unlikely to want, such as cython and tcl, and some things that people may want but who knows, such as setuptools/flit-core/hatchling (and their deptrees). The usefulness of pruning this is somewhat questionable but all these packages can be fetched again from the binhost automatically with little fuss.

eli-schwartz commented 2 months ago

/cc @thesamesam (ticket based on an IRC discussion)

ArsenArsen commented 2 months ago

I have a (cursed) use-case for the hefty images also, so if they're being removed I'd like to see the full images kept also somehow

ajakk commented 2 months ago

This seems like a dupe of https://github.com/gentoo/gentoo-docker-images/issues/107 and I'm not sure I see any particularly new arguments here, but I guess this indicates that more people than I once thought care about this <200mb.

Are people using Gentoo images as generic base images like Debian or Alpine where this size could potentially make a lot of difference? I'm struggling to see the ROI rationale.

eli-schwartz commented 2 months ago

A major new argument is that people who need removed/slimmed packages can actually fetch those from a binhost these days.

Are people using Gentoo images as generic base images like Debian or Alpine where this size could potentially make a lot of difference? I'm struggling to see the ROI rationale.

I'm not the world's greatest docker lover to begin with. That being said, I do use it for GitHub Actions workflows to run distro integration tests, including on Gentoo and occasionally run those CI containers by hand for testing too. Larger containers are annoying to me since they represent things I will never use and every time you fetch a weekly update, that's another 1.2 GB of possibly slow bandwidth used up. And you cannot AFAIK slim down an existing docker container, only make it bigger, because it stores every version of the container as diffs against each other.

Note as well that it's not a big deal to download a giant stage3 tarball to install new hardware, because I only reinstall gentoo once and then use it for years and years, and I don't really care what the one-time size cost is. But you don't create a docker container and then every few days / week run updates on the docker container and then retag the results. You re-initialize from the most recent fresh base image. So reducing the amount you have to regularly redownload has some value.

Which means if you want to slim down the container you have to do it from the start -- even if you always plan to have a "fat" container as the default, that has to be implemented as a layer on top of the slim one.