autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving
https://www.autoware.org/
Apache License 2.0
8.58k stars 2.88k forks source link

feat(ci): enable Docker build cache to 5 times faster container build #4730

Closed youtalk closed 1 month ago

youtalk commented 1 month ago

Description

This PR introduces the Docker build cache. https://docs.docker.com/build/cache/

I uses max cache mode so that the all layers are cached even those of intermediate steps. https://docs.docker.com/build/cache/backends/#cache-mode

Tests performed

Effects on system behavior

Not applicable.

Interface changes

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

After all checkboxes are checked, anyone who has write access can merge the PR.

mitsudome-r commented 1 month ago

I have a question regarding how Docker build cache works.

In Autoware's Docker file, there are commands like vcs import < autoware.repos. For this command, the file autoware.repos won't change, but the main branch of the repositories that are imported with vcs might change from the previous docker build.

Would docker build cache still able to detect such change?

youtalk commented 1 month ago

It's a good point of view. I'm investigating the fact and please wait a while.

youtalk commented 1 month ago

@mitsudome-r I've fixed the vcs import problem by https://github.com/autowarefoundation/autoware/pull/4738. Please review it again.

oguzkaganozt commented 1 month ago

After merging this PR, we should test this branch with docker-build-and-push to see the actual caching. See

youtalk commented 1 month ago

I've merged the latest main branch https://github.com/autowarefoundation/autoware/pull/4730/commits/9d899e8a3377044eda62459ecc92e0acc60eaf9f and am testing the docker-build-and-push-main: https://github.com/autowarefoundation/autoware/actions/runs/9202286332

youtalk commented 1 month ago

Unfortunately cuda was failed due to the disk full problem again. We must fundamentally fix this problem.

youtalk commented 1 month ago

To resolve the problem I'm checking the removal of --download-artifacts to diet the image size. https://github.com/autowarefoundation/autoware/tree/upstream-disable-download-artifacts https://github.com/autowarefoundation/autoware/actions/runs/9208339027

xmfcx commented 1 month ago

How are we filling up the disk space? We have 54GB of space to work with 😯

Could you add:

      - name: Show disk space
        run: |
          df -h

To some key places in https://github.com/autowarefoundation/autoware/blob/main/.github/actions/docker-build-and-push/action.yaml file so we know when it is filling up?

youtalk commented 1 month ago

I think the exporting to docker image format consumes temporarily lots of disk space. https://github.com/autowarefoundation/autoware/actions/runs/9202286332/job/25311862778#step:7:16211

#62 [prebuilt] exporting cache to registry
#62 writing layer sha256:ddb6f24d072dea4bb47d142b42a63cf859b9f4b3465cdc8d7a969a16a0d41ce0 0.2s done
#62 writing layer sha256:e4421d97df0323dadf48afbce56f12c3ba90temporalilya5bf89b3c2fe15d7d020a3b641ff
#62 preparing build cache for export 136.2s done
#62 writing layer sha256:e4421d97df0323dadf48afbce56f12c3ba90a5bf89b3c2fe15d7d020a3b641ff 0.0s done
#62 writing layer sha256:e7aa205a7d8d546c887dbbee409f2270ce7a415f2b4562bc567a457af4c05156 0.1s done
#62 CANCELED

#61 [devel] exporting to docker image format
#61 sending tarball 60.1s done
#61 ERROR: failed to copy to tar: rpc error: code = Unknown desc = write /tmp/devel.tar: no space left on device

#58 [runtime] exporting to docker image format
#58 exporting layers 294.6s done
#58 CANCELED
------
 > [devel] exporting to docker image format:
------
ERROR: failed to solve: failed to copy to tar: rpc error: code = Unknown desc = write /tmp/devel.tar: no space left on device
Error: buildx bake failed with: ERROR: failed to solve: failed to copy to tar: rpc error: code = Unknown desc = write /tmp/devel.tar: no space left on device

Once this problem occurs, subsequent actions are interrupted, so there is an issue where the disk size cannot be checked.

youtalk commented 1 month ago

Finally we did it! https://github.com/autowarefoundation/autoware/actions/runs/9225112277 Let's merge and build faster ever!