buildbarn / bb-remote-execution

Tools for Buildbarn to allow remote execution of build actions
Apache License 2.0
111 stars 64 forks source link

Dynamic spawning of workers based on platform properties #40

Open prestonvanloon opened 4 years ago

prestonvanloon commented 4 years ago

For example, I have configured a platform like

platform(
    name = "platform",
    constraint_values = [
        "@bazel_tools//platforms:x86_64",
        "@bazel_tools//platforms:linux",
        "@bazel_tools//tools/cpp:clang",
    ],
    remote_execution_properties = """
        properties: {
          name: "container-image"
          value:"docker://gcr.io/my-image@sha256:d7407d58cee310e7ab788bf4256bba704334630621d8507f3c9cf253c7fc664f"
        }
        properties {
           name: "OSFamily"
           value:  "Linux"
        }
        """,
)

I have set this platform via --host_platform=//config:platform, but it seems that buildbarn has hardcoded platform information in the jsonnet config. Is that correct or am I misunderstanding it?

EdSchouten commented 4 years ago

So to answer your question: Buildbarn does respect the platform properties, but there is no logic in place (yet) to dynamically spawn workers based on, say, a Docker container image name. This means that you'll need to make sure to spin up workers in advance that have matching platform properties.

EdSchouten commented 4 years ago

Also to answer a follow-up question: how could we extend Buildbarn to support this? Two ways I can think of:

  1. Have a method where we can hook into the scheduler, so that you can detect incoming build actions for unsupported worker kinds. A helper process would listen to these events and spawn workers accordingly.
  2. Allow workers to specify which labels in the platform properties are variable. The worker would then be responsible for spawning the containers dynamically.
EdSchouten commented 4 years ago

Also: be sure to visit the HTTP endpoint of bb-scheduler. It provides a simple web UI that shows the state of the scheduler. Workers and operations are grouped by platform queues, which are identified by those platform properties.

cphang99 commented 4 years ago

Also to answer a follow-up question: how could we extend Buildbarn to support this? Two ways I can think of:

  1. Have a method where we can hook into the scheduler, so that you can detect incoming build actions for unsupported worker kinds. A helper process would listen to these events and spawn workers accordingly.

  2. Allow workers to specify which labels in the platform properties are variable. The worker would then be responsible for spawning the containers dynamically.

@EdSchouten This was something that I was thinking of as well. This would be particularly relevant for the use of recc clients for buildbarn, where management of build dependencies into the input root would not be covered by the REAPI client. There is more discussion at https://gitlab.com/celduin/remote-execution/remote-execution/issues/2

Do you think that this would be in-scope for a bb-autoscaler, that would hook into bb-scheduler, maybe with a simple gRPC protocol, that would detect unsupported worker kinds/variable platform properties, and spawn accordingly the right containers? I suspect that it would be also beneficial for the same bb-autoscaler to be configurable so that it could make API calls to AWS/GCP/Azure to be able to spin up new nodes, that would then register with the bb-scheduler.

EdSchouten commented 4 years ago

Yep! One of the things that could help there is that there is already an internal API for extracting stats from the scheduler. We could upgrade that to a separate gRPC API, which may be used by such an autoscaler.

QuantSuv commented 2 years ago

@EdSchouten has this capability been added to bb-autoscaler?

EdSchouten commented 2 years ago

We now have a BuildQueueState gRPC API that can be used to inspect the scheduler's state. I guess that's already of some use here.

Not much else has been done in this area.