autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving
https://www.autoware.org/
Apache License 2.0
8.58k stars 2.88k forks source link

refactor(docker): introduce `src-imported` stage #4712

Closed youtalk closed 1 month ago

youtalk commented 1 month ago

Description

This is the resubmission of #4709 to push the upstream src-imported branch.

This PR adds the src-imported stage to the top stage of the Dockerfile. This allows the base and src-imported stages to be processed in parallel, reducing the download latency of the vcs import command. It also eliminates the need to run the vcs import command again in the runtime stage. The trial in my local environment speeded up the docker build by about 5 minutes.

This PR does not cut out rosdep install, based on the reflection of the previous PR: https://github.com/autowarefoundation/autoware/pull/4656#issuecomment-2082095239

Tests performed

Not applicable.

Effects on system behavior

Not applicable.

Interface changes

Pre-review checklist for the PR author

The PR author must check the checkboxes below when creating the PR.

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

After all checkboxes are checked, anyone who has write access can merge the PR.

xmfcx commented 1 month ago

We are waiting for:

youtalk commented 1 month ago

no-cuda was succeeded but cuda was failed...

Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.316.1/_diag/Worker_20240513-205133-utc.log'

mitsudome-r commented 1 month ago

@xmfcx Do you remember if put the fix for "No space left on device" for the cuda workflow?

xmfcx commented 1 month ago

This is something new. In this job: https://github.com/autowarefoundation/autoware/actions/runs/9062897702/job/24897719154#step:3:1835

AFTER CLEAN-UP:

$ dh -h /

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        73G   21G   53G  28% /

There is 53GB space, which should be plenty.

But the job also took 6h 4m 59s. which is not normal.

xmfcx commented 1 month ago

https://github.com/autowarefoundation/autoware/actions/runs/8921431056 2024-05-14_14-31

The last time this ran, it took about 2.5 hours.

youtalk commented 1 month ago

First of all I'm going to speed up the CI build with cache and then come back here. We should solve the speed problem first.

oguzkaganozt commented 1 month ago

@youtalk If you COPY the source code from a different layer then you will not be able to remove it in the next RUN command which is an another layer. So that means the src files would still take space on the docker cache even you are deleting them with rm . So in here there is a trade-off between reducing the build time by 5 minutes vs adding whole source files into runtime image. I think we should be in favor of the reduced-size of the runtime image.

youtalk commented 1 month ago

@oguzkaganozt Oh, it's good point of view.

https://github.com/autowarefoundation/autoware/pull/4656 might be better approach. The PR calls both of vcs import and rosdep install only once. There is no side effect you mentioned.

However we need to the difference of the packages between --dependency-types build and --dependency-types exec. https://github.com/autowarefoundation/autoware/pull/4656#issuecomment-2082095239

rosdep install --dependency-types build
rosdep install --dependency-types exec

I'm investigating. Please wait a while.

youtalk commented 1 month ago

By generating the install package lists and copy the list only on the runtime step, this PR has no longer side effect of the runtime image size. https://github.com/autowarefoundation/autoware/pull/4712/commits/2d70277670ccd08630854819d6f34f1aff3bb98e

@oguzkaganozt Please review again. @xmfcx This PR have been succeeded to run both of the docke-build-and-push-main (no-cuda) and docke-build-and-push-main (cuda). Please review again. https://github.com/youtalk/autoware/actions/runs/9143787390/job/25140910449?pr=10 https://github.com/youtalk/autoware/actions/runs/9143787390/job/25140910551?pr=10

oguzkaganozt commented 1 month ago

Thanks @youtalk now it's looking really good

doganulus commented 1 month ago

Why not adding --mount=type=cache,target=/autoware/src for corresponding RUN statements?

This wouldn't be simpler?

youtalk commented 1 month ago

@doganulus It's not cache. We need to manage the source code change to rerun docker build. That's why I made the subsequent PR https://github.com/autowarefoundation/autoware/pull/4738.

doganulus commented 1 month ago

@youtalk If you run vcs import on the cache, it will update it. Yes, it is a cache.

It is better to use the container's own mechanisms than rolling out our custom caches whenever possible. I think it is possible. Something like that:

ARG AUTOWARE_VERSION=20240521
ENV AUTOWARE_VERSION=${AUTOWARE_VERSION}
...
RUN --mount=type=cache,target=/src/autoware,id=/src/autoware/${AUTOWARE_VERSION} \
    vcs import ...