Open sitaktif opened 4 months ago
~Sounds good to me.~ Actually, no, on second thought - can you explain the use case a bit more?
We already set a mnemonic ("CopyFile") on the actions. Under what circumstances would you wish to treat some file copy actions differently from others?
can you explain the use case a bit more?
Absolutely.
We use a remote build execution (RBE) service, and building things remotely is better (faster/hermetic/cached) for most of our targets. However, some targets, typically packaging ones (for example oci/docker images) need to be uploaded from the local host to a different server (e.g. an oci/docker registry) which is, I think, a fairly common use case across the community.
For these packages, a typical flow for "build an OCI image of a binary on top of a base image using RBE" normally looks like:
rules_oci
/rules_docker
)But running steps 4, 5 and 6 remotely is detrimental:
Running steps 4, 5, 6 locally is much faster (typically 2-10x depending) and frees up more than half of our caching storage. Somewhere between step 6 and 7, we need to rename a file (this is where copy_file
becomes relevant).
Going back to this feature proposal:
Bazel natively provides a great option to mark some actions as local vs remote: --modify_execution_info
.
One can run e.g. bazel build ... --modify_execution_info=ImageLayer=+no-remote,JoinLayers=+no-remote
, meaning that actions with these mnemonics will run locally.
This works great until... one needs to copy a file. The CopyFile
mnemonic is too generic for users to pass to --modify_execution_info
; copying output files can be used in many situations, some local and some remote. Adding an optional mnemonic
attribute would allow differentiating these situations. The change would be rather unintrusive.
Besides the above, custom mnemonics allow better analysis of builds, e.g. via GRPC logs.
Note that a similar issue was opened for run_binary
(#426) and I'd be happy to address that one at the same time.
I am currently attempting to debug some increased cache misses in our remote cache. I've tracked the problem down to something causing plenty of CopyFile
actions to be re-run, but it's not entirely clear where these are being scheduled. Knowing that would help enormously with the debugging.
If the mnemonic could be changed by users to something a bit more descriptive, tracking down this issue would be far simpler.
Aha, unnecessary copying of the output of always-local actions to remote is the problem that I was missing.
Yes, I agree the feature would be useful. You can ping me for review.
Please have a look at #491 at your convenience!
Custom mnemonics can have some useful benefits. For examples, it can allow for more fine-grained filtering of what actions get to run locally vs remotely (via
--modify_execution_info
option /no-remote
tags).To that effect, it would be great to allow users to specify custom mnemonics for
copy_file
, via an attribute.If maintainers are happy with the idea, I'll happily submit a PR.