buildbarn / bb-remote-execution

Tools for Buildbarn to allow remote execution of build actions
Apache License 2.0
109 stars 64 forks source link

Mount '/proc' into chrooted actions #115

Open stagnation opened 1 year ago

stagnation commented 1 year ago

Problem Description / Feature Request

Using chrooted actions is great for fully hermetic input roots, but some tools rely on the /proc filesystem to be mounted and will not work without it. The block devices from /dev can all be mounted (or created) into the input root, but full filesystems are trickier.

This lists some tools that do not work without /proc, shows a patch I used to work around it and discusses the limitations of the patch and what we need before we can merge it.

Problems

In building bb-deployments I found the following issues::

 ERROR: /home/spill/.cache/bazel/_bazel_spill/627a8b419c21f55bea98864c74ae71f0/external/go_sdk/BUILD.bazel:46:15: GoToolchainBinary external/go_sdk/builder [for host] failed: (Exit 2): go failed: error executing command external/go_sdk/bin/go tool link -o bazel-out/host/bin/external/go_sdk/builder bazel-out/host/bin/external/go_sdk/builder.a
 Action details (uncached result): http://localhost:7984/remote-execution/blobs/historical_execute_response/73ce201a6e9757c300fdc64a213b855f63f796813c33f4b2274715c9dae9e43d-550/
 go: cannot find GOROOT directory: /usr/local/go

and::

 ERROR: /home/spill/.cache/bazel/_bazel_spill/627a8b419c21f55bea98864c74ae71f0/external/com_github_buildbarn_bb_storage/pkg/otel/BUILD.bazel:62:8: Executing genrule @com_github_buildbarn_bb_storage//pkg/otel:stylesheet failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
 Action details (uncached result): http://localhost:7984/remote-execution/blobs/historical_execute_response/e709ec1e8614565d1a6c7a1248ca167361280bd6404edd286e45213811dce6f4-561/
 bazel-out/host/bin/external/npm/purgecss/bin/purgecss.sh.runfiles/build_bazel_rules_nodejs/third_party/github.com/bazelbuild/bazel/tools/bash/runfiles/runfiles.bash: line 170: RUNFILES_MANIFEST_FILE: unbound variable
 bazel-out/host/bin/external/npm/purgecss/bin/purgecss.sh.runfiles/build_bazel_rules_nodejs/third_party/github.com/bazelbuild/bazel/tools/bash/runfiles/runfiles.bash: line 176: RUNFILES_MANIFEST_FILE: unbound variable

 >>>> FAIL: The node binary 'nodejs_linux_amd64/bin/nodejs/bin/node' not found in runfiles.
 This node toolchain was chosen based on your uname 'Linux x86_64'.
 Please file an issue to https://github.com/bazelbuild/rules_nodejs/issues if
 you would like to add your platform to the supported rules_nodejs node platforms. <<<<

 ERROR: /home/spill/.cache/bazel/_bazel_spill/627a8b419c21f55bea98864c74ae71f0/external/bazel_tools/tools/jdk/BUILD:336:14: JavaToolchainCompileClasses external/bazel_tools/tools/jdk/platformclasspath_classes [for host] failed: (Exit 127): javac failed: error executing command external/remotejdk11_linux/bin/javac -source 8 -target 8 -Xlint:-options -cp external/remotejdk11_linux/lib/tools.jar -d bazel-out/host/bin/external/bazel_tools/tools/jdk/platformclasspath_classes ... (remaining 1 argument skipped)
 Action details (uncached result): http://localhost:7984/remote-execution/blobs/historical_execute_response/788fa4f412c6b68678ec8c72d0bdfe63b766386c6011e672f997a81c2fbf40f5-704/
 external/remotejdk11_linux/bin/javac: error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory

It was easy to find the root cause for the go problem::

 3367 28    readlinkat(AT_FDCWD, "/proc/self/exe",  <unfinished ...>
 3370 28    <... readlinkat resumed>0xc000126000, 128) = -1 ENOENT (No such      file or directory)

It tries to find the go binary by looking at itself through /proc, and then find the default GOROOT standard library relative to the compiler. The proc filesystem is core to Unix/Posix, or some such standard execution environment specification and for chrooted actions to work for all tools we should have it.

This is also a problem for rust, which has the best error message of the bunch https://buildteamworld.slack.com/archives/CD6HZC750/p1692391421607939

::

thread 'main' panicked at 'failed to get current_exe: no /proc/self/exe available. Is /proc mounted?'

Proof of Concept Patch

Note, this was developed in February, and has not been rebased on recent work, https://github.com/buildbarn/bb-remote-execution/commit/1524fef4bf2044fef95210437152679e36cf00a8 which adds PATH lookup to the same file. Though the patch is orthogonal. ::

diff --git pkg/runner/BUILD.bazel pkg/runner/BUILD.bazel
index f134f66..33d33f8 100644
--- pkg/runner/BUILD.bazel
+++ pkg/runner/BUILD.bazel
@@ -29,6 +29,7 @@ go_library(
         "@org_golang_google_protobuf//proto",
         "@org_golang_google_protobuf//types/known/anypb",
         "@org_golang_google_protobuf//types/known/emptypb",
+        "@org_golang_x_sys//unix",
     ] + select({
         "@io_bazel_rules_go//go/platform:android": [
             "//pkg/proto/resourceusage",
diff --git pkg/runner/local_runner.go pkg/runner/local_runner.go
index 34ac73a..74d1445 100644
--- pkg/runner/local_runner.go
+++ pkg/runner/local_runner.go
@@ -3,7 +3,11 @@ package runner
 import (
 >  "context"
 >  "errors"
+>  "fmt"
+>  "io/fs"
+>  "os"
 >  "os/exec"
+>  stdpath "path"
 >  "path/filepath"
 >  "syscall"
-
@@ -11,6 +15,7 @@ import (
 >  "github.com/buildbarn/bb-storage/pkg/filesystem"
 >  "github.com/buildbarn/bb-storage/pkg/filesystem/path"
 >  "github.com/buildbarn/bb-storage/pkg/util"
+>  "golang.org/x/sys/unix"
-
 >  "google.golang.org/grpc/codes"
 >  "google.golang.org/grpc/status"
@@ -122,7 +127,7 @@ func (r *localRunner) Run(ctx context.Context, request *runner.RunRequest) (*run
-
 >  cmd, workingDirectoryBase := r.commandCreator(ctx, request.Arguments, inputRootDirectory)
-
->  // Set the environment variable.
+>  // Set the environment variables.
 >  cmd.Env = make([]string, 0, len(request.EnvironmentVariables)+1)
 >  if r.setTmpdirEnvironmentVariable && request.TemporaryDirectory != "" {
 >  >   temporaryDirectory, scopeWalker := r.buildDirectoryPath.Join(path.VoidScopeWalker)
@@ -158,6 +163,43 @@ func (r *localRunner) Run(ctx context.Context, request *runner.RunRequest) (*run
 >  }
 >  cmd.Stderr = stderr
-
+>  if cmd.SysProcAttr != nil && cmd.SysProcAttr.Chroot != "" {
+
+>  >   rootdir := cmd.SysProcAttr.Chroot
+>  >   mode := fs.FileMode(0755)
+
+>  >   var err error
+>  >   err = os.MkdirAll(rootdir, mode)
+>  >   if err != nil {
+>  >   >   panic(err)
+>  >   }
+>  >   err = os.Chdir(rootdir)
+>  >   if err != nil {
+>  >   >   panic(err)
+>  >   }
+
+>  >   noOptions := ""
+>  >   noFlags := 0
+
+>  >   for _, mount := range []struct {
+>  >   >   point   string
+>  >   >   fstype  string
+>  >   >   options uintptr
+>  >   }{
+>  >   >   {"/proc", "proc", uintptr(0)},
+>  >   >   {"/sys", "sysfs", uintptr(0)},
+>  >   } {
+>  >   >   absolute := stdpath.Join(rootdir, mount.point)
+
+>  >   >   err = unix.Mount(mount.point, absolute, mount.fstype, mount.options, noOptions)
+>  >   >   if err != nil {
+>  >   >   >   fmt.Printf("Mount error %#v %#v", mount.point, err)
+>  >   >   >   panic(err)
+>  >   >   }
+>  >   >   defer unix.Unmount(absolute, noFlags)
+>  >   }
+>  }
+
 >  // Start the subprocess. We can already close the output files
 >  // while the process is running.
 >  err = cmd.Start()

Problems with merging

Note that the absolute path is constructed here. Because the mount syscall does not work with relative paths. So the Directory abstraction in the runner, where the directory is hidden, cannot be used for these mounts.

Mountat

The solution is to use the newer mount syscall api to construct a "mountat" functionality, which can mount with relative filepaths. (What the "at" suffix typically means). But when I prototyped this I could not find a way to unmount the paths.

::

 ...

 void
 mountat(const char *fstype, const char *source, const char *dirname)
 {
     int fd = fsopen(fstype, FSOPEN_CLOEXEC);
     fsconfig(fd, FSCONFIG_SET_STRING, "source", source, 0);
     fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
     int mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);
     move_mount(mfd, "", AT_FDCWD, dirname, MOVE_MOUNT_F_EMPTY_PATH);
 }

 int
 main()
 {
     mountat("proc", "/proc", "proc");
     mountat("sysfs", "/sys", "sys");
 }

This can be used in go as well, last I checked there was a PR in review for these new syscalls, but to create a wrapper for what we need is not hard. The problem is how to unmount the "proc" and "sys" relative paths after the action. According to the documentation it should work, but I could not piece it together.

TODO ++++

I have a lot more to say on this topic, and should wrap up my investigation and publish that. and should publish the scaffolding c and go code to use them, but the files are too long to paste in here.

fishrockz commented 1 year ago

Thank you @stagnation for writing this up

I will not that rustc will not do much more than print its version without /proc

stagnation commented 1 year ago

I have a solution in review now. 1: https://github.com/buildbarn/bb-storage/pull/177 2: https://github.com/buildbarn/bb-remote-execution/pull/116

And a reproduction here: https://github.com/stagnation/bb-deployments/tree/feature/reproduce-bb-remote-execution-115 that uses the bare deployment to verify the patches. Currently only works on Linux.


Reproducing the problem

The problem statement is simple: programs that use /proc cannot run in chroot. It should be easy to create a reproduction.

However, there are not publicly available programs that create a user-space input root. This is something that middleware could do, but requires a lot of code.

What options do we have? Use the go-compiler, a bash script to call ls /proc/self, or compile a statically linked binary to do the same. The [rules_go] toolchain builds a hermetic go compiler and sends that in the input root, so it is a good candidate. However it uses the system gcc to compile it. And we get errors that gcc is not available. An interpreted program like bash does not work either, as we do not have the interpreter available.

Thankfully it is simple to create a statically linked go program

  go_binary(
    ...
    pure = "on",
  )

And we can use the eminent run_binary rule, which does not require bash to execute a tool. We use the compiled program directly as a source in the tool attribute. If we were to use native_binary we get snagged up by a bash dependency in its CopyFile action.

But with this we fail, here is the output from bb-browser:

  Command^*

        Arguments:       ./ls-proc bazel-out/k8-fastbuild/bin/out
  Environment variables: PATH=/bin:/usr/bin:/usr/local/bin

  Result

  Status: Code 3: Failed to run command: Failed to start process: fork
          /exec /usr/bin/env: no such file or directory

This is because Buildbarn itself wraps the process with /usr/bin/env. Here env is used for PATH resolution before chroot.

Get yourself an /usr/bin/env for fun and profit

Thankfully busybox has easy to use programs, that can help in a pinch, especially the musl versions.

$ docker run -v .:/out busybox:musl sh -c "cp /usr/bin/env /out/env" $ file env env: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

Now we just need to put it in the input root as /usr/bin/env.

$ mkdir -p usr/bin $ mv env usr/bin/

With the "introspection" target

run_binary(
    name = "introspection",
    outs = ["out"],
    args = ["$(location out)"],
    execution_requirements = {
        "no-cache": "1",
    },
    tool = ":ls-proc",
    srcs = ["usr/bin/env"],  # env is required in Buildbarn's process creation.
)

Build remotely

With the customary docker-compose setup.

bazel build \
  --remote_executor=grpc://localhost:8980 \
  --remote_instance_name=fuse \
  --remote_default_exec_properties OSFamily=linux \
  --remote_default_exec_properties container-image="docker://ghcr.io/catthehacker/ubuntu:act-22.04@sha256:5f9c35c25db1d51a8ddaae5c0ba8d3c163c5e9a4a6cc97acd409ac7eae239448" \
  @//:introspection

This works with regular runners, but not chroot runners.

Apply the patches

A setback! The docker-compose setup does not build the docker images, so we would have to create deliverables from the pull requests, which is tedious. Instead we use the bare deployment where we do compile the runner from source.

So we instead build with:

bazel build \
  --remote_executor=grpc://localhost:8980 \
  --remote_instance_name="" \
  //:introspection

And instead start it through bazel run:

  $ mktemp -d
  /tmp/tmp.sjR6aHjhni
  $ bazel run --script_path launch-bare //bare
  # or bazelisk if you use that, the super user typically does not have bazel on its PATH.
  # or --run_under sudo
  $ sudo launch-bare /tmp/tmp.sjR6aHjhni

Notice that we no longer use FUSE, but for our reproduction that is okay. In production use you do want FUSE

The pull requests are applied through the go_dependencies

     go_repository(
         name = "com_github_buildbarn_bb_remote_execution",
         importpath = "github.com/buildbarn/bb-remote-execution",
-        sum = "h1:BKoGfhCfn5IA4JRLMB7I4yHsM06fLvOc/zwzSxEuNrY=",
-        version = "v0.0.0-20230905173453-70efb72857b0",
+        remote = "https://github.com/stagnation/bb-remote-execution",
+        vcs = "git",
+        commit = "01729791c366b6d713bf4f5e6c706cb274292539",
     )
+
     go_repository(
         name = "com_github_buildbarn_bb_storage",
         importpath = "github.com/buildbarn/bb-storage",
-        sum = "h1:z9yMGmzNNjhC2KnxYGfP8bPk1/l3jpd3+rb+1YkhQg4=",
-        version = "v0.0.0-20230905110346-c04246b462b6",
+        remote = "https://github.com/stagnation/bb-storage",
+        vcs = "git",
+        commit = "58bf2fed198fad8d60af26419fa0548e897165a8",
     )

Successful build

INFO: From RunBinary out:
arch_status
attr
autogroup
auxv
cgroup
clear_refs
cmdline
comm
coredump_filter
cpu_resctrl_groups
cpuset
Target //:introspection up-to-date:
  bazel-bin/out
INFO: Elapsed time: 0.306s, Critical Path: 0.03s
INFO: 2 processes: 1 internal, 1 remote.
INFO: Build completed successfully, 2 total actions

P.S: Sorry for the spam, I wish github wouldn't be quite so verbose about my pushes.

fishrockz commented 6 months ago

I had a bit of a surge of motivation to try this out recently, i normally use the docker compose in bb-deployments to use buildbarn is there a similarly easy way to try this out?

What's the recommended way to try out a patch like this if your not used to setting up buildbarn?

stagnation commented 6 months ago

Hi! Thank you, I do not have pre-built docker images, and have not rebased the patches in a while. Give me some time and I can prepare that for you, then you should be able to build the images locally and use them through docker-compose. That would be a good use-case to document too.

stagnation commented 6 months ago

I have rebased the PR now and will try to get it over the finish line soon.

Here is the instructions you need in our documentation repo. https://meroton.com/docs/improved-chroot-in-buildbarn/reproducing-the-problem/ They should help you run the bare deployment.