TraceMachina / nativelink

NativeLink is an open source high-performance build cache and remote execution server, compatible with Bazel, Buck2, Reclient, and other RBE-compatible build systems. It offers drastically faster builds, reduced test flakiness, and specialized hardware.
https://nativelink.com
Apache License 2.0
1.13k stars 103 forks source link

Sandboxing #673

Open malt3 opened 6 months ago

malt3 commented 6 months ago

Nice project! I was wondering if you support any kind of sandboxing out of the box (similar to Bazel’s Linux sandbox). Otherwise, do you have any suggestions on sandboxing? I was thinking about running every action in Linux kernel namespaces or something similar. Alternatively, I think it would be interesting to place the workers in gvisor containers or microvms for isolation.

aaronmondal commented 6 months ago

@malt3 Could you elaborate on this? Do you mean action-specific sandboxes that are automatically set up by nativelink when running remote actions in non-Bazel builds?

Not sure if this is related, but a pattern I use for sandboxing with Bazel builds is to run remote execution locally. For instance, the deployment-exaamples/kubernetes example can be run on a local machine. Pointing the remote_executor to the local cluster effectively runs all remoteable actions in container sandboxes in the local k8s cluster. It's a slightly more involved process to set up initially, but has the benefit of being imediately portable to larger scale cloud infrastructures.

malt3 commented 6 months ago

Happy to elaborate on this a bit. I'm talking about two different but related features:

Do you mean action-specific sandboxes that are automatically set up by nativelink when running remote actions in non-Bazel builds?

I think yes. To be extra clear: I'd like to have a mode where RunningActionImpl::execute_inner does not execute the action directly and instead wraps it with a sandbox command. One such sandbox wrapper could be the Bazel's linux-sandbox or the process-wrapper binary. I think setting entrypoint in the config could be a start.

The goal of this would be to ensure build hermeticity (prevent undeclared inputs) and ensure build actions cannot influence each other. Another goal is to correctly allow / disallow networking in tests. Bazel local execution allows end users to allow networking for selected tests (but disallow for all others). This is currently not implemented by most remote-execution services.

This is also implemented in Bazel Buildfarm: user-facing config, implementation

The alternative / additional question is if there is existing configuration for isolating workers from the host. It sounds like the kubernetes deployment is a step in that direction.

allada commented 6 months ago

Thanks @malt3, This is something we have been wanting to do, but doing it right is tricky and can cause long term technical debt if done improperly.

It is definitely something we want to do, but we have been fighting some other battles right now. We have been going back and forth internally on if we should support it through something more like containerd or more manual with cgroups + rootfs. The big benefit of using containerd is that you get everything else it supports with a simple API, but a more simple approach like cgroups + rootfs is significantly and more portable faster (in my understanding).

The good news is that we do have an inefficient workaround right now: https://github.com/TraceMachina/nativelink/issues/313

If you want more elaboration on how we have used this in production let me know and I can throw an example up. :smile:

malt3 commented 6 months ago

This is great news! I see value in either route (big containerd solution or custom cgroups + rootfs) so from an outside perspective I’d say: why not both? But I do understand that you would like to avoid feature creep. If you want early feedback on an implementation feel free to post it here and I’ll take a look.