gentoo / gentoo-docker-images

[MIRROR] Common effort to get an official and automated gentoo base docker container
https://gitweb.gentoo.org/proj/docker-images.git
GNU General Public License v2.0
319 stars 90 forks source link

Large image size #107

Open reavessm opened 3 years ago

reavessm commented 3 years ago

I don't know if this is the right place because this isn't really an issue, but more of a question.

Why do the gentoo containers have such a large image size? Currently, latest on am64 is 287.76 MB, while the Fedora image is 58.39 MB and Ubuntu is down to 22.95 MB. I'm seeing that /usr/libexec/gcc is taking 111 MB, and I understand that it wouldn't be Gentoo without GCC, but is there any other place to trim some fat?

ultrabug commented 3 years ago

As you can see we stick with the official tarballs and they are meant to offer an environment from which you can build and install your Gentoo Linux, I guess that's why.

KSmanis commented 3 years ago

AFAIK other distros slim down their Docker images by removing cruft such as unnecessary packages, man pages, etc. We could definitely apply something similar here, but it would probably require guidance from a Gentoo Developer, i.e., someone with a good understanding of the stage3 tarball structure and what is required for a working container.

nnzv commented 9 months ago

You can delete unnecessary packages, like cmake, that the software doesn't need to run. Force unmerge with:

emerge -W --rage-clean <foo>

Deleting directories like /var/db is okay as users aren't expected to enter there.

rm -rf /var/db # also /var/cache/distfiles /var/tmp/portage /usr/share/{doc,man} /var/cache/binpkgs 

For more details, check https://wiki.gentoo.org/wiki/Knowledge_Base:Freeing_disk_space.

nnzv commented 9 months ago

Also see https://wiki.gentoo.org/wiki/User:Arzano/Towards_a_slim_Gentoo_container_image

ajakk commented 9 months ago

You can delete unnecessary packages, like cmake, that the software doesn't need to run. Force unmerge with:

We shouldn't remove build dependencies, because it's just going to make it harder to install things downstream.

Deleting directories like /var/db is okay as users aren't expected to enter there.

No, it isn't, because that breaks your Gentoo installation by wiping out the Portage state stored in /var/db/pkg. I think it might make sense to remove the documentation and manpages, since it's approaching 1/3rd of the total image size. As for the other directories, they're empty:

$ podman run -it gentoo/stage3 find /var/cache/distfiles /var/tmp/portage /var/cache/binpkgs
/var/cache/distfiles
find: '/var/tmp/portage': No such file or directory
/var/cache/binpkgs
nnzv commented 9 months ago

We shouldn't remove build dependencies, because it's just going to make it harder to install things downstream.

In a regular Gentoo system, we shouldn't. In containers, it doesn't make much sense to keep build dependencies (excluding run dependencies like Ruby) if you've already compiled the software. Obviously, this is to achieve a tiny Docker image.

RUN emerge foo && emerge -W --rage-clean foo-dependency

No, it isn't, because that breaks your Gentoo installation by wiping out the Portage state stored in /var/db/pkg

Yeah, my bad, the fat thing is /var/db/repos/gentoo. In a remote case where you want to execute a shell session for the container and want to restore the ebuild repository, you can simply do:

emaint sync -r gentoo

At least, that's what works for me. Regarding the "empty" directories, it depends on the thing you emerge to the system; a fresh container doesn't have many things to delete.

berney commented 3 months ago

The images, gentoo/portage and gentoo/stage3 are effectively just docker versions of the tarballs you can download from normal Gentoo distribution channels. These are just docker images of the same, effectively a way to distribute the equivalent thing via docker registries. They serve as being the same building blocks for making a gentoo distro in a docker container, as the tarballs do for a virtual machine or physical host.

For the end image that you want to run, say a web server like nginx, ideally it would be a single nginx binary, distroless, no gcc, no emerge, not bash, etc. For that you can either use a multistage build similar Arzano's page linked https://github.com/gentoo/gentoo-docker-images/issues/107#issuecomment-1821217314.

Or use Kubler:

Kubler is a build tool that uses Gentoo to build packages, and creates a docker image with just the packages - the final image does not have portage, emerge, the rest of the file system - it just has the packages and whatever you explicitly created.

If you want an official slimmed gentoo docker image, that still has emerge etc, but doesn't have the manpages etc, that should be a separate image like gentoo/gentoo (or something like that). I think the gentoo/portage and gentoo/stage3 images should map to upstream tarballs. If the tarballs have the manpages, the docker images should too. So, if the tarballs should drop the manpages, and then the docker images won't have the manpages.