dmwm / CRABServer

15 stars 38 forks source link

Improve PyPI images building process #8522

Closed novicecpp closed 3 weeks ago

novicecpp commented 3 months ago

Currently, the crabtaskworker image is...too big (3.88 GB uncompressed), mainly because the Dockerfile is not optimized. Also, newer image could not use the layers from older image even the contents are the same because it get rebuild everytime, make it consume so much space (both dev machine and registry), and time.

We can improve this by

belforte commented 3 months ago

is this a good topic for the summer student ? Investigate, try, evaluate.... ?

novicecpp commented 2 months ago

First, we can improve the size by simply fix [single line]() that cause files duplication.

To confirm, using [dive]() to inspect the image by dump the output to json using this command:

dive -j crab-taskworker-v3-240618-stable.json registry.cern.ch/cmscrab/crabtaskworker:v3.240618-stable

Here the json output: crab-taskworker-v3-240618-stable.json (Note that I cannot open dive in interactive mode, it freeze.)

If I interpret output correctly, the .image.fileReference field show number of time and the file path that have more than one copy in different layer.

For example from json output:

{
  "image": {
    "sizeBytes": 3880994829,
    "inefficientBytes": 1740276490,
    "efficiencyScore": 0.7727956479413789,
    "fileReference": [
      {
        "count": 2,
        "sizeBytes": 141771252,
        "file": "/data/repos/WMCore/.git/objects/pack/pack-04239b6b0afb94c64c197d9272850b397d18256e.pack"
      },
      {
        "count": 2,
        "sizeBytes": 74617776,
        "file": "/data/miniconda/lib/libopenblasp-r0.3.21.so"
      },
      ....

The file /data/miniconda/lib/libopenblasp-r0.3.21.so is duplicate, but it may store in different layer.

When I do image flattening (crane flatten), the final image is reduce from 3.8GB to 2.98 GB (please ignore the naming here)

REPOSITORY                                        TAG                                                                    IMAGE ID       CREATED         SIZE
localhost:44332/test1                             testflatten                                                            9d223d8eba5a   3 hours ago     2.92GB
localhost:44332/test1                             test                                                                   a506bf737f7d   3 hours ago     3.8GB

That mean, if we did not make file duplication when build the image, you can reduce final image size up to 900MB.

Now, let take a look at that [single line]() that cause files duplication.

RUN chown -R 1000:1000 ${WDIR}

This line, We change the owner of all files in /data/ path, where we install all of necessary script/binary/library for our services. What docker does is copy all files in this path to new layer, then change permission.

Why? (I did not have ref here, but I think this is what happens ) Because of how container image works; each layer is immutable. You need to copy to the new layer in order to modify it. The 2 major advantages of of immutable layer are integrity and caching. You have hash of each layer and can share togather with other images, save disk/transfer/time.

To fix this, we need to make sure at the time we create the files inside container, it has a proper permission. For example

novicecpp commented 3 weeks ago

To summarize the solution for above comment,

Janna removed the trouble line and set proper owner when do COPY and RUN statement. It quite tedious to fix it all for long Dockerfile files.

But after fix, it surprised me that we actually gain 900MB back!

REPOSITORY                                TAG                      IMAGE ID       CREATED       SIZE
registry.cern.ch/cmscrab/crabtaskworker   pypi-test12-1724940928   01e0817c31a5   4 days ago    2.97GB  
registry.cern.ch/cmscrab/crabtaskworker   v3.240809-stable         8332f57c2394   3 weeks ago   3.8GB

v3.240809-stable is the image version before Janna PR merged.

dive still report some files duplication. However, it is a small fraction so no need to bother.

novicecpp commented 3 weeks ago

Janna's PR also contains change of image builder from Kaniko to BuildKit, plus the simpler way to caching images across CI build jobs using external storage backend (we use registry backend which is already supported by registry.cern.ch). So, Instead of extra maintenance of creating baseimage, we just push all cache to registry. It cost extra space in registry but it reduce building.

I do not have exact data on how much time it reduce from using BuildKit and registry caching, but in gereral:

But the most satisfying thing for me is when you build in CI and try to pull/run on your local machine, sometimes it takes only 5 seconds to complete. It might be because of BuildKit, cache, refactoring the Dockerfile, or all of them.

I love this so much because I only have mobile internet in my apartment and it takes ages to pull image again and again.

novicecpp commented 3 weeks ago

I think we do not need the "Caching PyPI deps to make pip install layers more idempotent." anymore because caching already work.

There is still some room for improvement to squeeze more building efficiency, which can start from https://docs.docker.com/build/cache/optimize/

But for this issue, this is enough and we (I am pretty sure that not just me) are happy about the result.

belforte commented 3 weeks ago

VERY HAPPY