Open ulrfa opened 2 years ago
I've been looking at the fallback situation for a bit internally, where there are some extra legacy flags. --remote_local_fallback
has the very reasonable use case of handling certain remote failures. Getting rid of --remote_local_fallback_strategy
would be good. As for what should be the fallback strategy, I think it should just go on to the next strategy in its list of possible strategies - just as if remote execution had not been allowed for the action.
Example (with made-up error messages):
bazel build --spawn_strategy=remote,worker,local --strategy=Foo=remote,local //foo:bar
...
INFO: Foo action failed remotely, falling back to `local` strategy.
INFO: Bar action failed remotely, falling back to `worker` strategy.
...
I've been looking at the fallback situation for a bit internally, where there are some extra legacy flags.
--remote_local_fallback
has the very reasonable use case of handling certain remote failures. Getting rid of--remote_local_fallback_strategy
would be good. As for what should be the fallback strategy, I think it should just go on to the next strategy in its list of possible strategies - just as if remote execution had not been allowed for the action.Example (with made-up error messages):
bazel build --spawn_strategy=remote,worker,local --strategy=Foo=remote,local //foo:bar ... INFO: Foo action failed remotely, falling back to `local` strategy. INFO: Bar action failed remotely, falling back to `worker` strategy. ...
Thanks for working on this @larsrc-google! I think your solution is good!
My only concern is the number of messages printed in terminal. Are you planning reporting each individual action? I'm afraid that is too verbose when remote system is down and all actions fallback. Or is it just me misunderstanding the example?
No, those were just for illustration. In practice, they should go in the log only.
Agree. The description of the document is quite confusing. Actually, we don't need to specify the spawn_strategy because the default strategy is already --spawn_strategy=remote,worker,linux-sandbox
.
However, it has nothing to do with --remote_local_fallback
, and it cannot replace --remote_local_fallback_strategy
. I'm surprised to find out that default local_fallback_strategy is local
, because most of the time, the default strategy for Bazel is sandboxed
.
@liyuyun-lyy Why don't you think the list of strategies (--spawn_strategy
and what might be set by mnemonic or regexp) replace --remote_local_fallback_strategy
? I could imagine --remote_local_fallback
just meaning "If remote fails, try the next one in line" (for some definition of "remote").
@liyuyun-lyy Why don't you think the list of strategies (
--spawn_strategy
and what might be set by mnemonic or regexp) replace--remote_local_fallback_strategy
? I could imagine--remote_local_fallback
just meaning "If remote fails, try the next one in line" (for some definition of "remote").
I'll give three examples.
--spawn_strategy=remote,worker,linux-sandbox
. If remote fails, it will not fallback and end with failture.--spawn_strategy=remote,worker,linux-sandbox
and --remote_local_fallback
. If remote fails, the bazel runner with fallback to use local
strategy.--spawn_strategy=remote,worker,linux-sandbox
and --remote_local_fallback
and --remote_local_fallback_strategy=sandboxed
. If remote fails, the bazel runner with fallback to use sandboxed
strategy. So case2 might not work as you expected. To reproduce the examples, you can have a look at issue #16132.
Ah, so you're saying that the current implementation doesn't help. But what if --remote_local_fallback
was changed to mean "try the next strategy in the regular list of strategies"?
Ah, so you're saying that the current implementation doesn't help. But what if
--remote_local_fallback
was changed to mean "try the next strategy in the regular list of strategies"?
Yes, I think that will be better.
Hmmm... that would take some rework, Right now exec()
(src/main/java/com/google/devtools/build/lib/exec/SpawnStrategyResolver.java:43) just picks the first possible strategy and execs that, so we don't really know where we are in the list of possible strategies. The --remote_local_fallback
handling is entirely separate (src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnRunner.java:485).
One thing we're missing is a way to signal "let someone else try" in the failure messages. This has also come up in dynamic execution and for workers.
Since --remote_local_fallback_strategy
is not a no-op (setting it to linux-sandbox
does in fact cause Bazel to fall back to that strategy instead of local
), what do you all think about updating the docs and removing the @Deprecated
tag while you're working on figuring out the next steps for spawn strategies as a whole?
Description of the feature request:
In case of remote execution issues, I expect
--remote_local_fallback
with--spawn_strategy=remote,worker,linux-sandbox
to fallback toworker
if available, and otherwise tolinux-sandbox
.But instead I see fallback based on
--remote_local_fallback_strategy
, which is supposed to be a no-op according to documentationI see two alternatives, preferably:
--remote_local_fallback
honor--spawn_strategy
or otherwise:
--remote_local_fallback_strategy
about it being a no-op (which is incorrect) and remove the deprecation of it.What underlying problem are you trying to solve with this feature?
I would like reliable builds with linux-sandbox, when remote execution cluster is temporarily unavailable.
I could use
--remote_local_fallback_strategy
despite it claiming to be a no-op, but that results in deprecation warnings.Which operating system are you running Bazel on?
No response
What is the output of
bazel info release
?No response
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.I use a custom build with local patches on top of 5.0.0-pre.20210929.1, but it seems this code has not changed on main branch.
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
Based on description/comments from @agoulti and @philwo in https://github.com/bazelbuild/bazel/issues/7480, it seems the intention was that
--remote_local_fallback_strategy
should not be needed any more.Then @ishikhman wrote a https://github.com/bazelbuild/bazel/issues/7480#issuecomment-528352938 that it is still needed for this use case, but ticket was closed without addressing that.
Any other information, logs, or outputs that you want to share?
No response