RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.29k stars 1.26k forks source link

[ci] Try --remote_download_minimal again #21121

Open jwnimmer-tri opened 7 months ago

jwnimmer-tri commented 7 months ago

We tried https://github.com/RobotLocomotion/drake-ci/pull/209 but our Bazel 6.x CI wasn't ready for that yet.

Once #21119 images are deployed, we should try re-applying that patch and see if sticks this time.

svenevs commented 7 months ago

Was looking back into this, the breadcrumbs:

Recent activity:

My understanding is we want to change

- --remote_download_outputs=all
+ --remote_download_outputs=minimal

But --remote_download_output also has toplevel, which may possibly be a better choice.

Regardless of minimal or toplevel, I'm not actually sure I understand what the failure criteria is (the job logs linked in slack are gone). Do we just YOLO it and if builds start breaking revert?

svenevs commented 7 months ago

Do we just YOLO it and if builds start breaking revert?

Clarification: in https://github.com/RobotLocomotion/drake-ci/pull/209 you can get rough stats, and assuming the test jobs launch for a couple of distros, that would make the drake-ci PR theoretically ready to merge. However, I don't have a record of what failed after we merged it (and therefore had to revert).

jwnimmer-tri commented 7 months ago

The default in Bazel 7.x was toplevel, which is what defeated us. I think if toplevel didn't work, then "minimal" will also not work.

For now, we should probably put this on hold again until after Bazel 7.2 (which should be soonish).

From https://github.com/bazelbuild/bazel/issues/20161, we might need to use both --experimental_remote_cache_eviction_retries and --experimental_remote_cache_lease_extension, or else figure out why replaying the build action isn't working.