ElektraInitiative / libelektra

Elektra serves as a universal and secure framework to access configuration settings in a global, hierarchical key database.
https://www.libelektra.org
BSD 3-Clause "New" or "Revised" License
208 stars 123 forks source link

reduce docker images #4637

Closed markus2330 closed 2 months ago

markus2330 commented 1 year ago
markus2330 commented 1 year ago

As 0x6178656c wrote in https://github.com/ElektraInitiative/libelektra/pull/4620#issuecomment-1295395453:

markus2330 commented 1 year ago

We are regularly running into "no space left" problems because of too many Docker images, so I tagged it as urgent and removed the "probably to be removed".

@mpranj any other suggestions other than the two Docker images above?

kodebach commented 1 year ago

One thing I noticed about our images is that they are very big. Maybe we can look into making them smaller, that should help with the disk space problems.

mpranj commented 1 year ago

We are regularly running into "no space left" problems because of too many Docker images, so I tagged it as urgent and removed the "probably to be removed".

I think this will not save any space.

AFAIK: removing unused images will do nothing to our disk space usage, as the images are not built by the pipeline. The images are build only when needed. We are actively using many images, so that is a problem.

One thing I noticed about our images is that they are very big.

Would be great if we can do something about this.

Maybe we can add the docker build option --squash to avoid storing multiple layers of the filesystem. There are always pros and cons, but it's worth a shot.

kodebach commented 1 year ago

Maybe we can add the docker build option --squash to avoid storing multiple layers of the filesystem.

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

The images are build only when needed.

So do we actually build new images for every Jenkins run? Is there any kind of auto-cleanup?


Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem? Could there be something else that is eating disk space too, e.g. log files with long retention periods, or artifacts of old builds?

mpranj commented 1 year ago

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

Yes, but I'll test this now to see if there is any difference. Also, I know this is how it should be true on one machine, but I have a feeling we're not reusing layers anyhow.

Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem?

Yes, pretty sure it is at least the biggest problem. Most other things are cleaned up.

So do we actually build new images for every Jenkins run? Is there any kind of auto-cleanup?

Not for every run, but when they are needed. So images are reused once they are build. They are rebuilt monthly s.t. the packages are updated periodically.

mpranj commented 1 year ago

Wouldn't that mean different images can't share a layer and all images would have to be built entirely from scratch, if there is the tiniest difference?

Unfortunately you're right. I've tested the --squash option and for the case of the build-elektra-fedora-36 images the difference is only 2.16GB vs 2.03GB.

kodebach commented 1 year ago

Okay, how exactly is our Fedora 36 Image over 2GB in size, when the base fedora:36 image is <60MB (see Docker Hub)? There has to be something in there that we don't need...

Another thing we could do: Remove Java from all images except one, maybe even remove it completely from Jenkins and only test on Cirrus. The JVM should be the same everywhere.

markus2330 commented 1 year ago

AFAIK: removing unused images will do nothing to our disk space usage, as the images are not built by the pipeline. The images are build only when needed.

Yes, this is why I extended the scope of this issue, the idea was to suggest which used Docker images (probably the least important ones) to remove or how to make them smaller.

Another thing we could do: Remove Java from all images except one, maybe even remove it completely from Jenkins and only test on Cirrus. The JVM should be the same everywhere.

Actually especially Java is very prone to problems in CMake detection and similar. So it is good to have these tests across several distributions.

Btw. the issue seems to be not as urgent as I thought. Used disc space is now: 346G used, 1.5T available, i.e. 20% used, so the problem is simply that running docker prune -af once a month was not enough.

Further suggestions what to reduce nevertheless are welcome. At some point we will need to do the cleanup.

markus2330 commented 1 year ago

Also, since I don't have access to the CI servers: Are we sure that the docker images are the problem? Could there be something else that is eating disk space too, e.g. log files with long retention periods, or artifacts of old builds?

After running docker prune -af on a7 the disc space usage goes from 100% to less than 20%.

kodebach commented 1 year ago

the idea was to suggest which used Docker images (probably the least important ones) to remove or how to make them smaller.

I see we have 4 different Debian Bullseye images? Why? I get the minimal image to test without installing dependencies, but the rest are probably wasting space. The same goes for Debian Buster.

Also if docker image prune -af (or even docker system prune) cleaned up > 1TB auf space, I would really be interested in what exactly was removed. e.g. docker image ls before and afterwards would be interesting.

Additionally, we can probably run docker image prune (without -a) much more often. It should not remove anything we need.

mpranj commented 1 year ago

Btw. the issue seems to be not as urgent as I thought. Used disc space is now: 346G used, 1.5T available, i.e. 20% used, so the problem is simply that running docker prune -af once a month was not enough.

Also if docker image prune -af (or even docker system prune) cleaned up > 1TB auf space

Seriously doubt this happened. Usually it cleans about 100-200GB. Maybe we should prune -af weekly? prune -f is run daily, prune -af is run monthly Note that deleting all images also means that the current ones need to be fetched from our docker registry, which has a rather slow connection.

What machine are you talking about?

On a7 we store the:

What might be a problem: The build agents keep current images which they need. (so far, everything is OK) When a Dockerfile is changed, a new version of this image is built and the build agents retrieve this image. Now we have two versions of this image per build agent. The issue worsens when multiple PRs change images multiple times.

markus2330 commented 1 year ago

I see we have 4 different Debian Bullseye images? Why?

To also test cmake exclusion of modules. Probably we should make these images build upon each other to use less space?

Maybe we should prune -af weekly?

Yes, sounds like the easiest solution for now. Is there some way to only cleanup the images that weren't used for a week?

What machine are you talking about?

In https://github.com/ElektraInitiative/libelektra/issues/4637#issuecomment-1313475416 I was talking about a7 of the recent incident https://github.com/ElektraInitiative/libelektra/issues/160#issuecomment-1312652971.

kodebach commented 1 year ago

To also test cmake exclusion of modules. Probably we should make these images build upon each other to use less space?

Building the images on top of each other would definitely help.

There's probably a few other things we can do. Like reducing the number of RUNs to reduce layers, or check that we're not installing e.g. some GUIs or other unnecessary stuff.

Is there some way to only cleanup the images that weren't used for a week?

Yes, the --filter argument can be used with a timestamp. See e.g. this page

4ydan commented 1 year ago

Fedora32 Docker image analysis

So I did a small investigation on the "scripts/docker/fedora/32/Dockerfile" image. I analyzed its layers and most of the size comes from all the packages installed. The whole image is 2.61GB and around 2.4GB are packages. image

Top 10 packages by size.

Package Size (MB)
golang-bin-1.14.15-3.fc32.x86_64 255.98
java-11-openjdk-headless-11.0.11.0.9-0.fc32.x86_64 170.76
java-1.8.0-openjdk-headless-1.8.0.292.b10-0.fc32.x86_64 117.47
clang-libs-10.0.1-3.fc32.x86_64 92.07
gcc-10.3.1-1.fc32.x86_64 81.71
llvm-libs-10.0.1-4.fc32.x86_64 78.23
glibc-debuginfo-2.31-6.fc32.x86_64 76.42
mesa-dri-drivers-20.2.3-1.fc32.x86_64 65.74
glibc-debuginfo-common-2.31-6.fc32.x86_64 57.20
python27-2.7.18-8.fc32.x86_64 54.59

Improvements

Adding weak_deps=False option

dnf install --setopt=install_weak_deps=False

--setopt=install_weak_deps=False: This flag disables the installation of weak dependencies, which can help reduce the number of unnecessary packages installed. Equivalent to --no-install-recommends in apt-get.

Result

Adding this dnf option reduced the image size by ~15%. image

Maybe it might be interesting to use some container registry like ghcr.io to reduce duplicate code and build some base images, that other dockerfiles could build upon.

markus2330 commented 1 year ago

Thank you for the investigation. Yes, please add this option(s).

github-actions[bot] commented 2 months ago

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart:

github-actions[bot] commented 2 months ago

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue. Thank you for your contributions :sparkling_heart: