Open jwnimmer-tri opened 7 months ago
Was looking back into this, the breadcrumbs:
Recent activity:
My understanding is we want to change
- --remote_download_outputs=all
+ --remote_download_outputs=minimal
But --remote_download_output
also has toplevel
, which may possibly be a better choice.
Regardless of minimal
or toplevel
, I'm not actually sure I understand what the failure criteria is (the job logs linked in slack are gone). Do we just YOLO it and if builds start breaking revert?
Do we just YOLO it and if builds start breaking revert?
Clarification: in https://github.com/RobotLocomotion/drake-ci/pull/209 you can get rough stats, and assuming the test jobs launch for a couple of distros, that would make the drake-ci
PR theoretically ready to merge. However, I don't have a record of what failed after we merged it (and therefore had to revert).
The default in Bazel 7.x was toplevel
, which is what defeated us.
I think if toplevel didn't work, then "minimal" will also not work.
For now, we should probably put this on hold again until after Bazel 7.2 (which should be soonish).
From https://github.com/bazelbuild/bazel/issues/20161, we might need to use both --experimental_remote_cache_eviction_retries
and --experimental_remote_cache_lease_extension
, or else figure out why replaying the build action isn't working.
We tried https://github.com/RobotLocomotion/drake-ci/pull/209 but our Bazel 6.x CI wasn't ready for that yet.
Once #21119 images are deployed, we should try re-applying that patch and see if sticks this time.