TraceMachina / nativelink

NativeLink is an open source high-performance build cache and remote execution server, compatible with Bazel, Buck2, Reclient, and other RBE-compatible build systems. It offers drastically faster builds, reduced test flakiness, and specialized hardware.
https://nativelink.com
Apache License 2.0
1.16k stars 108 forks source link

how to kill worker after the remote action execution? #815

Open vors opened 6 months ago

vors commented 6 months ago

Thank you for the awesome project!

Let's say that we have the following setup:

It would be very useful to allow work to exit nativelink binary after a single execution -- this way I can kill the pod and it would be re-created proving a clean environment for the next action execution.

allada commented 6 months ago

Currently this is not supported.

@zbirenbaum, could you make a PR that will shutdown the worker after N number of jobs have been processed and make it configurable in the json? This should allow @vors to just set this value to 1 which solves this issue.

In the long run we are likely going to split workers into two parts (worker & executor). The executor would be super light weight and it's job is to just do book keeping. The worker would be a single process running on the same machine (required) and it's job is to prepare the environment for the executor then instruct the executor to do the actual work. By doing this we can then make a worker implementation that can talk to k8s/docker/containerd directly and just launch nativelink inside a pod on the same machine.

vors commented 6 months ago

I'd really appreciate if you can implement this proposal. That is one thing that is needed for our deployment.

zbirenbaum commented 6 months ago

Currently this is not supported.

@zbirenbaum, could you make a PR that will shutdown the worker after N number of jobs have been processed and make it configurable in the json? This should allow @vors to just set this value to 1 which solves this issue.

Sure! I'll get started on this