Open buchgr opened 5 years ago
remote_local_fallback has incidentally been broken since 1532df0e5d1, where the catches with fallback responses (and the reasonable error handling and reporting) are not catching the now-RuntimeError that were previously RetryException, and caught by IOException handlers.
This is incredibly broken even without the fallback, and could be addressed, but will not improve the error handling here, which now dies in the skyframe AbstractParallelEvaluator when, for instance, a DEADLINE_EXCEEDED occurs during download.
Link to the email announcement: proposal
One of the use cases discovered for a local fallback is when cache download/upload reached a Timeout. Potential solution to this problem in #7590 with a progressive timeouts (throw exception when nothing was uploaded within X seconds and NOT when the file was not uploaded within Y seconds).
I think local fallback is valuable, if only in situations with poor network connectivity.
Instead of removing the flag altogether, we can make it so --remote_local_fallback
will also toggle --remote_upload_local_results=false
to prevent misconfigured builds from breaking (which was the original intent of this PR).
If set explicitly, we should also allow local_fallback=true
and upload_local_results=true
as it might be useful in CI to maximize reliability of the builds. With properly configured C++ toolchains, we've been sharing artifacts among Ubuntu 14 and 18 with no issues at all.
I think local fallback is valuable, if only in situations with poor network connectivity.
Instead of removing the flag altogether, we can make it so
--remote_local_fallback
will also toggle--remote_upload_local_results=false
to prevent misconfigured builds from breaking (which was the original intent of this PR).
I like the idea! In general, discussion was paused for a while, because we realized that a local_fallback
is actually a very useful feature in some cases.
One of the main concerns is a risk of a cache poisoning, but I don't want to completely disable caching for locally executed actions. Especially considering that there are only a few use cases of the feature known to us and for them caching local actions is a safe thing to do.
If set explicitly, we should also allow
local_fallback=true
andupload_local_results=true
as it might be useful in CI to maximize reliability of the builds. With properly configured C++ toolchains, we've been sharing artifacts among Ubuntu 14 and 18 with no issues at all.
The only concern here is how could we identify whether --remote_upload_local_results=true
was set explicitly or is it a default value (it is true
by default)? A solution that comes to my mind is another flag, which I'd like to avoid - we have too many flags already.
The only concern here is how could we identify whether --remote_upload_local_results=true was set explicitly or is it a default value
Why would you need to do that?
Why would you need to do that?
In order to know whether the flag was set explicitly or not. You've suggested to toggle it to false
unless it was explicitly set to true
. So how would we know whether it should be flipped to false
or not?
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2.5 years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.
I think this is still valid @coeuvre?
That's interesting. I believe local and remote should be identical otherwise dynamic execution wouldn't work.
cc @tjgq for the work on multiple platform dynamic execution.
The
--remote_local_fallback
flag falls back to local execution if there was some error (i.e. network) running the action remotely. This is inherently unsafe as the local and remote execution environment might not be identical.