REv3 idea: do stuff other than execve()

EdSchouten commented 3 years ago

What's pretty awesome about the Remote Execution protocol is that you can seamlessly send requests to multiple platforms. Run a couple of commands on Linux, combine those results into an action scheduled on Windows, etc.

One common pattern that people have is that they use run of the mill x86 hardware to build embedded firmware images. Those may then be flashed onto an embedded system (e.g., a bare-metal ARM development board) for integration tests. What's a bit annoying right now is that the Remote Execution protocol has no native way to describe such workloads. The best you can achieve is that you schedule actions on, say, a Linux box ("host system") that's attached to the embedded system, and that your test's entry point is a wrapper script that runs on the host system, calling into tooling to do the flashing, rebooting of the hardware, capturing serial I/O, etc.. Though that approach 'works', there are some disadvantages:

Security: The wrapper script may contain arbitrary commands. It's hard for administrators of such host systems to ensure that the engineers on the project don't make a mess out of the system.
Cost/scaling: To save costs, you may want to attach the host system to a larger number of embedded boards. Now you need to make 100% sure that nobody ever makes changes to the wrapper script that cause it to interact with the wrong board, as that may introduce flakiness for everyone else.
Compatibility: It's pretty easy to formalise what needs to be flashed onto a certain kind of embedded board. Usually it's a list of n firmware images that need to be flashed into different ROMs. If you use a wrapper script, you now also need to care about how the host system is provisioned. It may be hard to provide compatibility guarantees for that over time.

One thing to consider is to make Command (and parts of Action?) 'pluggable'. The version of Command we have right now is a good fit for UNIX-like systems. For running tests on embedded boards, you may want to use a different message:

message MyAwesomeEmbeddedDeviceCommand {
  build.bazel.remote.execution.v3.Digest boot_rom = 1;
  build.bazel.remote.execution.v3.Digest wifi_chip_firmware = 2;
  build.bazel.remote.execution.v3.Digest gpu_firmware = 3;
  build.bazel.remote.execution.v3.Digest storage_controller_firmware = 4;
}

Or if it's an embedded Linux board (that is capable of eventually running regular UNIX commands), you may want to do something as creative as:

message MyAwesomeEmbeddedLinuxDeviceCommand {
  build.bazel.remote.execution.v3.Digest linux_rom = 1;
  build.bazel.remote.execution.v3.Command command_to_run_inside_of_linux_after_it_has_booted = 2;
}

Footnote: An approach like this raises the question: What happens if I declare a custom command that looks like this?

message FetchCommand {
  repeated string urls = 1;
  build.bazel.remote.execution.v3.Digest expected_digest = 2;
}

Would that make the Remote Asset API superfluous?

moroten commented 3 years ago

Making the test environment independent of the production code is also what we have been striving for. That makes it possible to change the CI infrastructure without having to update all maintenance branches. This sounds great!

It looks like Digest command_digest and Digest input_root_digest can be replaced with your suggestion. Users might also want to extract part of the information sent through the Platform platform today into their respective "pluggable commands".

I can't find any flaws in such a change regarding the RPC calls. GetActionResult and FindMissingBlobs etc. will continue to function. Therefore, it sounds doable to integrate already in v2. For example, let input_root_digest be unset and update command_digest refer to a digest of Any, not just build.bazel.remote.execution.v2.Command.

moroten commented 3 years ago

This suggestion is basically the opposite of https://github.com/bazelbuild/remote-apis/issues/144. Note that the Remote Asset API fetch response contains some additional fields compared to the ActionResult. One can put that information into specific output files, but that seems a bit hacky. I think adding a pluggable Any embedded_result_digest and repeated Digest digests_referenced_by_embedded_result into ActionResult should be reasonable, although that requires an extra round trip to get the data back unless also adding bytes embedded_result_raw. digests_referenced_by_embedded_result is needed by the remote server to check that all information is available in the CAS and update their time to live during GetActionResult.

mhadjimichael commented 3 years ago

This sounds very similar to the Platform properties.

Do you think the platform properties could be extended in v3 to allow specifically Digest values (and possibly types that aren't strings as it is right now)?

sluongng commented 7 months ago

I have spoken to folks in the embedded industry who would be interested in some standardization in RE API around "hardware provisioning" problem. The current "workaround" of using wrapper scripts is coupled with pain points like @EdSchouten highlighted in the original post.

However, I would like to note that if we were to make Command effectively Any, then in the context of client/server discovery, it's hard to tell whether the RBE server would support the generic/custom Command that the client is going to send. Such capabilities today are communicated via the Platform message, a free-style string-string map, to help aid the RBE server in picking the right worker for the job. But that is only for worker "selection" and not a worker "preparation". Ideally, both processes should be coupled and formalized in V3 somehow.

Another angle I want to highlight is a potential explosion of Command types. As a protocol, perhaps we want to restrict it to help guide newer implementation to use existing Command(s) first and help improve it, before implementing a new one. Perhaps we could make Command field to be oneof the types declared in our proto. New implementation could go through the review process to add their custom Commands to the protocol (to gain review feedbacks and wider implementation support).

bazelbuild / remote-apis

REv3 idea: do stuff other than execve() #198