ThePorgs / Exegol-images

Docker images of the Exegol project
https://exegol.readthedocs.io/
GNU General Public License v3.0
73 stars 56 forks source link

Reduce size of images #339

Open lap1nou opened 2 months ago

lap1nou commented 2 months ago

Description

Greetings,

This PR aims to reduce the size of the image (by around 775 MB) by removing files that aren't used and that take place in the layers of the final image.

Image size before: ~25,34 GB, after: ~24,374 GB, I used dive (https://github.com/wagoodman/dive) to know which file was wasting space.

I also have other optimisations that could reduce the size of the image, I can create another PR if this one is merged.

Regards.

Related issues

N/A

Point of attention

I only ran dive on the web image, I will probably use it on the full image if you find this PR interesting in order to find more optimizations.

QU35T-code commented 2 months ago

Very good job @lap1nou !

gbe commented 2 months ago

Nice catch ! Question: instead of deleting individual files, wouldn't it be better to simply wipe the content of the folder /tmp in the post install phase ? That way we'd know for sure there isn't any files left. I can't check at the moment.

ShutdownRepo commented 2 months ago

Nice catch ! Question: instead of deleting individual files, wouldn't it be better to simply wipe the content of the folder /tmp in the post install phase ? That way we'd know for sure there isn't any files left. I can't check at the moment.

already the case

image

@lap1nou any idea why the files you removed on your PR were still here?

lap1nou commented 2 months ago

Greetings,

@ShutdownRepo yes I think it's a Docker specificity, while you are indeed removing them, this happens in another layer (because it's another RUN command in the Dockerfile, each command create another layer), but the file are still present in one of the layer so this take space in the final Docker image.

References:

I'm not a Docker expert so I might be wrong, altough the size of the image do indeed shrink with these changes.

Regards.

Dramelac commented 2 months ago

any idea why the files you removed on your PR were still here?

Indeed, Docker store every file of the layer at the end of each RUN commands, so removing them in the last layer is not enough.

Thanks @lap1nou for the PR, don't hesitates to submit more if you have time to dive into it ;)

gbe commented 2 months ago

Instead of merging this PR , I think a refactoring of the function post_install should be considered as this function already takes care of wiping /tmp.

It means that the function post_install should :

It could also be a new dedicated function like clean_docker_layer.

lap1nou commented 2 months ago

I agree @gbe,

However I have to say that removing pip folder after they are written is less efficient than telling pip to not write them in the first place, but a clean function would indeed be more clear / readable / maintenable.

lap1nou commented 2 months ago

Hey,

Here is some more space we could save:

I need your feedback to know which of these optimisations I can apply to this PR, I already tested them a bit offline and it seems to work fine.

My full image seems to be at 41 GB now but this may not be precise.

gbe commented 2 months ago

I had already suggested in a previous PR to remove the .git folders but @ShutdownRepo wants to keep them so users can pull the latest versions if needed.

ShutdownRepo commented 2 months ago

Instead of merging this PR , I think a refactoring of the function post_install should be considered as this function already takes care of wiping /tmp.

It means that the function post_install should :

* not be called at the latest layer, but at the end of each layer to avoid the oversize

* include the work done in [Reduce image size - remove pip cache dir #341](https://github.com/ThePorgs/Exegol-images/pull/341)

It could also be a new dedicated function like clean_docker_layer.

agreed let's do that

Hey,

Here is some more space we could save:

* [ ]  Use `GOFLAGS='-ldflags=-s'` before any `go build / install`, this should save around 4 MB per binary (it would save more combined with the `-w` LDFLAGS but I can't use it as the `go` command is handled by `catch_and_retry()`), this should represent something like 150 MB in total

* [ ]  Remove `bundler` cache with `--no-cache` (or remove the cache folder from a dedicated function as suggested)

* [ ]  Run `npm prune --production`, this should remove all devDependencies (I think this should save around 200 MB)

I need your feedback to know which of these optimisations I can apply to this PR, I already tested them a bit offline and it seems to work fine.

My full image seems to be at 41 GB now but this may not be precise.

Let's do 2 (bundler) and 3 (npm prune)

I had already suggested in a previous PR to remove the .git folders but @ShutdownRepo wants to keep them so users can pull the latest versions if needed.

Yes, imo we need to keep .git. But I'd be curious to have a benchmark of how much space we could gain. Because if we were to save a lot of space, maybe I would be wiser to reconsider.

Anyways, thank you @gbe and @lap1nou for sharing your insight and investing your time in this thread!! Reducing the size of the images is important for the project and its users.

lap1nou commented 2 months ago

Good I'm going to apply all of this, the bundler thing could be applied in the clean layer function but not the npm one.

To answer your question @ShutdownRepo, .git folder takes 1.52 GB on the full image, here is the script used (remove the last result as we don't count the mounted resources .git folder):

#!/bin/bash

# Find all .git folders and calculate their sizes
total_size=0
while IFS= read -r -d '' git_folder; do
    git_size=$(du -b -s "$git_folder" | awk '{print $1}')
    total_size=$(echo $total_size + $git_size | bc)
done < <(find / -type d -name ".git" -print0 2>/dev/null)

# Display the total size in human-readable format
echo "Total size of all .git folders: $total_size"
gbe commented 2 months ago

Maybe the .git folders could be left exclusively in the nightly image for users willing to have cutting edge versions?