buildbarn / bb-remote-execution

Tools for Buildbarn to allow remote execution of build actions
Apache License 2.0
113 stars 66 forks source link

Bazel build fails if remote worker is being downscaled during build #118

Closed JSGette closed 11 months ago

JSGette commented 1 year ago

We've deployed buildbarn in kubernetes cluster and want to automatically scale remote workers up and down. If I use a simple command to downscale deployment that manages remote workers bazel reports an error and doesn't retry anymore.

Example of downscale command:

# Used to be 6
kubectl scale deploy <worker_deployment> --replicas=5

Error reported by bazel:

INFO: Remote execution message for CppLink <app>: Action details (uncached result): <url_to_browser>
ERROR:<REDACTED>/BUILD:46:14: Linking application/<REDACTED>/app failed: (Exit -1): aarch64-unknown-nto-qnx7.1.0-gcc failed: error executing command (from target <REDACTED>:app) <REDACTED>/aarch64-unknown-nto-qnx7.1.0-gcc @bazel-out/aarch64-opt/bin/<REDACTED>/app-2.params
INFO: Elapsed time: 397.438s, Critical Path: 215.26s
INFO: 25814 processes: 8053 remote cache hit, 17226 internal, 158 linux-sandbox, 377 remote.
FAILED: Build did NOT complete successfully

--remote_retries is set to 5

bazel version: 6.3.2

buildbarn versions: scheduler: 20230808T060019Z-22e8ab3 worker: 20230308T094934Z-44790d8

EdSchouten commented 1 year ago

Hey! Make sure you have the following things set up: