facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.53k stars 215 forks source link

Remote execution platform properties on a per-target basis? #512

Open thoughtpolice opened 10 months ago

thoughtpolice commented 10 months ago

The addition of the remote_test_execution toolchain was pretty interesting, I thought (cf. #476 for reference.) In short it allows uses of rules like rust_test to add a set of remote_execution attributes to a target, and those (open ended) attributes get folded into the CommandExecutorConfig.remote_execution_properties field, which in turn is part of ExternalRunnerTestInfo. The intent I guess is that some rules may need to run on specific remote runners.

For OSS use, in the same way, the remote_execution_properties field would be resolved by BuildBarn (or whatnot) to route execution to a correct container. For example, I might configure my BuildBarn runner instance to say:

      platform: {
        properties: [
          { name: 'OSFamily', value: 'Linux' },
          { name: 'container-image', value: 'nix-bb-runner' },
        ],
      },

And the OSFamily and container-image values need to line up with the CommandExecutorConfig.remote_execution_properties. This is just some generic Linux container that might have some known resource limits.

However, this only works because ExternalRunnerTestInfo takes its own default_executor field.

But this is not how execution platforms are otherwise wired up, from what I can tell, and only testing is special. Test rules allow you to specify their RE properties in the provider. But most execution platforms are instead provided as targets, and the default target is set through something like .buckconfig via target_platform_detector_spec.

For example, in my own Prelude I have a target prelude//platform:default, which is just an alias of some other target that can be local or remote. For example, prelude//platform:x86_64-linux-local or prelude//platform:x86_64-linux-re:

$ buck2 targets prelude//platform:
Build ID: 2479d8f2-446b-4f42-9e38-86392898beb0
Network: Up: 0B  Down: 0B
Jobs completed: 2. Time elapsed: 0.0s.
prelude//platform:aarch64-linux-re
prelude//platform:aarch64-macos-re
prelude//platform:aarch64-windows-re
prelude//platform:default
prelude//platform:x86_64-linux-local
prelude//platform:x86_64-linux-re
prelude//platform:x86_64-macos-re
prelude//platform:x86_64-windows-re

Motivation: resource/usage limits for specific targets

The reason I'm interested in this is because providing those properties as part of the rule seems pretty interesting in some cases, and one of them I actually was thinking about was the case where one rule might take a specific resource that needs to be part of the remote_execution_properties. For example, imagine the following rules, inspired by the way remote_test_execution works:

generate_systemverilog(
  name = 'thing-rtl',
  sources = ...,
)

run_vlsi_tools(
  name = 'thing-floorplanned',
  design = ':thing-rtl',
  re_properties = "vlsi"
)

Then you might wire up re_properties to look up those values as a provider, like so, under toolchains//:

remote_execution_properties(
  name = "remote_execution_properties",
  profiles = {
    "generic": {
      ...
    },
    "vlsi": {
      "ramsize": "ultra",
      "cores": "many",
    },
  }
)

The idea here is that the "vlsi" field sets the extra ramsize and cores properties on the RE messages, which would route that execution to the proper container (that has 1 bajillion cores and gigabytes allocated.)

I don't see any way to currently achieve this, as none of the default rule parameters or components of AnalysisContext or ctx.actions let me specify/merge these properties, or select them in any way.

Maybe this is just a documentation fluke?

Execution groups(?)

There is a note in the documentation about a planned feature called "Execution groups":

Execution groups are a future feature that will allow a rule to perform execution platform resolution multiple times and then specify in which of the resolved platforms each action runs in.

This actually sounds exactly like what I want, but phrased differently.

thoughtpolice commented 10 months ago

This would also be really useful for platform-specific tests. For example I just got an AVX-512 machine, and I would need something like this to say "This test or binary must run on this platform..." in order to ensure the codepath works correctly.

alexlian commented 10 months ago

Yup, so internally with our test runner we can route tests to Remote Execution. I suspect our planned work (though empty on details) in Test Info v2 would better support the test pathway you mention. @ndmitchell probably can offer more as that develops.

We also have work planned to try and route Remote Execution parameters per rule. So, this should address things beyond toolchains.