bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.98k stars 4.03k forks source link

remote: remove local fallback for remote execution #7202

Open buchgr opened 5 years ago

buchgr commented 5 years ago

The --remote_local_fallback flag falls back to local execution if there was some error (i.e. network) running the action remotely. This is inherently unsafe as the local and remote execution environment might not be identical.

werkt commented 5 years ago

remote_local_fallback has incidentally been broken since 1532df0e5d1, where the catches with fallback responses (and the reasonable error handling and reporting) are not catching the now-RuntimeError that were previously RetryException, and caught by IOException handlers.

This is incredibly broken even without the fallback, and could be addressed, but will not improve the error handling here, which now dies in the skyframe AbstractParallelEvaluator when, for instance, a DEADLINE_EXCEEDED occurs during download.

ishikhman commented 5 years ago

Link to the email announcement: proposal

One of the use cases discovered for a local fallback is when cache download/upload reached a Timeout. Potential solution to this problem in #7590 with a progressive timeouts (throw exception when nothing was uploaded within X seconds and NOT when the file was not uploaded within Y seconds).

nicolov commented 5 years ago

I think local fallback is valuable, if only in situations with poor network connectivity.

Instead of removing the flag altogether, we can make it so --remote_local_fallback will also toggle --remote_upload_local_results=false to prevent misconfigured builds from breaking (which was the original intent of this PR).

If set explicitly, we should also allow local_fallback=true and upload_local_results=true as it might be useful in CI to maximize reliability of the builds. With properly configured C++ toolchains, we've been sharing artifacts among Ubuntu 14 and 18 with no issues at all.

ishikhman commented 5 years ago

I think local fallback is valuable, if only in situations with poor network connectivity.

Instead of removing the flag altogether, we can make it so --remote_local_fallback will also toggle --remote_upload_local_results=false to prevent misconfigured builds from breaking (which was the original intent of this PR).

I like the idea! In general, discussion was paused for a while, because we realized that a local_fallback is actually a very useful feature in some cases.

One of the main concerns is a risk of a cache poisoning, but I don't want to completely disable caching for locally executed actions. Especially considering that there are only a few use cases of the feature known to us and for them caching local actions is a safe thing to do.

If set explicitly, we should also allow local_fallback=true and upload_local_results=true as it might be useful in CI to maximize reliability of the builds. With properly configured C++ toolchains, we've been sharing artifacts among Ubuntu 14 and 18 with no issues at all.

The only concern here is how could we identify whether --remote_upload_local_results=true was set explicitly or is it a default value (it is true by default)? A solution that comes to my mind is another flag, which I'd like to avoid - we have too many flags already.

nicolov commented 5 years ago

The only concern here is how could we identify whether --remote_upload_local_results=true was set explicitly or is it a default value

Why would you need to do that?

ishikhman commented 5 years ago

Why would you need to do that?

In order to know whether the flag was set explicitly or not. You've suggested to toggle it to false unless it was explicitly set to true. So how would we know whether it should be flipped to false or not?

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2.5 years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

brentleyjones commented 1 year ago

I think this is still valid @coeuvre?

coeuvre commented 1 year ago

That's interesting. I believe local and remote should be identical otherwise dynamic execution wouldn't work.

cc @tjgq for the work on multiple platform dynamic execution.