Ekumen-OS / lambkin

Apache License 2.0
11 stars 0 forks source link

Failed to build beluga vs nav2 benchmark image for release target #114

Open marcoshuck opened 1 month ago

marcoshuck commented 1 month ago

Bug description

I'm unable to build the benchmark image, the error doesn't seem to be very informative, but I'm rather doing basic DevOps work than trying to understand the underlying issue and I thought it was worth reporting it.

Platform (please complete the following information):

How to reproduce

List steps to reproduce the issue:

  1. ./tools/setup.sh
  2. ./tools/earthly ./src/benchmarks/beluga_vs_nav2+release

Expected behavior An image in the docker image list

Actual behavior Error log:

./s/b/beluga_vs_nav2+build | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build | --> COPY . src/beluga_vs_nav2
./s/b/beluga_vs_nav2+build | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build | --> RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/e/t/latex+embed-ubuntu-release |    This may take some time... done.
./s/b/beluga_vs_nav2+build | Logged into development environment for external/ros2 on Ubuntu: jammy
./s/e/t/latex+embed-ubuntu-release | Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.8+dfsg-1ubuntu0.3) ...

./s/b/beluga_vs_nav2+build | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build | Reading package lists...

./s/b/beluga_vs_nav2+build | E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

./s/e/t/latex+embed-ubuntu-release | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build | ERROR src/benchmarks/beluga_vs_nav2/Earthfile:56:4
./s/b/beluga_vs_nav2+build |       The command
./s/b/beluga_vs_nav2+build |           RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build |       did not complete successfully. Exit code 100

================================== ❌ FAILURE ===================================

./s/b/beluga_vs_nav2+build *failed* | Repeating the failure error...
./s/b/beluga_vs_nav2+build *failed* | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build *failed* | --> RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build *failed* | Logged into development environment for external/ros2 on Ubuntu: jammy

./s/b/beluga_vs_nav2+build *failed* | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build *failed* | Reading package lists...
./s/b/beluga_vs_nav2+build *failed* | E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
./s/b/beluga_vs_nav2+build *failed* | ERROR src/benchmarks/beluga_vs_nav2/Earthfile:56:4
./s/b/beluga_vs_nav2+build *failed* |       The command
./s/b/beluga_vs_nav2+build *failed* |           RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build *failed* |       did not complete successfully. Exit code 100
./s/e/p/timemory+ubuntu-build | WARN src/external/profiling/timemory/Earthfile:23:4: The command
./s/e/p/timemory+ubuntu-build |           RUN SPACK_SKIP_MODULES= . /opt/spack/share/spack/setup-env.sh && spack install -y --fail-fast timemory@develop+tools && spack gc -y
./s/e/p/timemory+ubuntu-build | failed: context canceled
./s/e/t/latex+embed-ubuntu-release | WARN src/external/typesetting/latex/Earthfile:22:4: The command
./s/e/t/latex+embed-ubuntu-release |           RUN apt update && apt install -y latexmk tex-gyre texlive-latex-extra texlive-latex-recommended texlive-fonts-recommended && apt clean && rm -rf /var/lib/apt/lists/*
./s/e/t/latex+embed-ubuntu-release | failed: context canceled

Help: To debug your build, you can use the --interactive (-i) flag to drop into a shell of the failing RUN step: "./tools/earthly -i ./src/benchmarks/beluga_vs_nav2+release"

Additional context

None

marcoshuck commented 1 month ago

The culprit seems to be:

    RUN . /etc/profile && apt update && rosdep update && \
        rosdep install -y -i --from-paths src -t build -t buildtool -t test \
            --skip-keys 'lambkin-shepherd lambkin-clerk' && \
        apt clean && rm -rf /var/lib/apt/lists/*
marcoshuck commented 1 month ago

It seems that a rosdep fix-permissions in sudo mode fixed the issue.

marcoshuck commented 1 month ago

Just to report that after the fix in #115, I was able to build the image.

glpuga commented 1 month ago

I can replicate the failure, but I wonder why the fix in #115 is not necessary for #93 to build just fine.

marcoshuck commented 1 month ago

I did a quick diff check and seems to be the same except for the src folder. When was the last time you ran the build target?

glpuga commented 1 month ago

I did a quick diff check and seems to be the same except for the src folder. When was the last time you ran the build target?

Hmm... a full build from-scratch, probably about a month back.

marcoshuck commented 1 month ago

It might be worth another try, maybe? 🧐

glpuga commented 1 month ago

The difference is because I never used

./tools/earthly ./src/benchmarks/beluga_vs_nav2+release

The README instructions for both beluga_vs_nav2 and beluga_vs_nav2_multi_dataset are to use

./tools/earthly ./src/benchmarks/beluga_vs_nav2+local-devel

instead. I can't tell if the release target would be able to generate the report.

Using release in beluga_vs_nav2_multi_dataset has the same issue you found in beluga_vs_nav2.

marcoshuck commented 1 month ago

I can't tell if the release target would be able to generate the report.

This is certainly important, I think we should be able to generate reports running docker run with a release target image, otherwise automation will be hard to implement. What do we need for this to happen?

glpuga commented 1 month ago

This is certainly important, I think we should be able to generate reports running docker run with a release target image, otherwise automation will be hard to implement. What do we need for this to happen?

The main issue is tha the full process takes about four days (happy path), assuming nothing fails along the way.

We can run a shorter run for a limited dataset and cross fingers and assume that any problem are not related to the image (which is kind of a lie, since at least the fix to #94 is applied on the docker image, so we'll have to make sure that fix is in all images).