Add --exec PATH to set the file to execute separate from argv[0]

Fixes containers/bubblewrap#91

I realise that an initial Github issue was created for this feature in 2016 and it doesn't appear to have come up again since. However, NixOS is weird and creates a usecase for separating the executable path in the container from the command name. The nix-bubblewrap project also ran into the same issue, and has a note about it in their README: https://github.com/fgaz/nix-bubblewrap#troubeshooting

Explanation

NixOS doesn't install executable binaries into /usr/bin, and instead installs each package into it's own immutable folder under /nix/store/${hash}-${package}-${version}/bin. NixOS also provides a command to list the packages a given package depends on. This makes it trivial to use bwrap to only --ro-bind the packages required for a given executable.

This all works wonderfully, except for two complications. To create a $PATH for a user's shell to use, NixOS creates folders of symlinks (often to other symlinks) for $PATH to reference. These folders of symlinks to executables could be --ro-bind into the container, but they're only needed to locate the initial executable for the command. We can just $(realpath $(which command)) and pass that to bwrap, except that fails for executables that change their behavior based on the command name used (say, bash behaving as sh, xz behaving as unxz, or coreutils behaving as date, df, cat, ...).

We could also have NixOS build a $PATH for just the packages needed, and use that when running bwrap, but we've already located the binary, and have the command name, we just need to have bwrap use the executable path and the argv[0] provided.

Concrete example

The $PATH points to a folder of symlinks to symlinks to symlinks. The two levels of symlink folders don't need to be in the container, they're just needed to locate the executable for the command. In this case, date command is actually a symlink to the coreutils executable.

% which date
/home/quag/.nix-profile/bin/date
% readlink $(which date)
/nix/store/8nzn8ghzknqgjsg1iv124qy0fjli3dwn-home-manager-path/bin/date
% readlink $(readlink $(which date))
/nix/store/apn3p2b40xvirn7w740wv2gy330ppib5-coreutils-9.3/bin/date
% readlink $(readlink $(readlink $(which date)))
/nix/store/apn3p2b40xvirn7w740wv2gy330ppib5-coreutils-9.3/bin/coreutils

Coreutils only needs the following file system to be able to execute.

% nix-store --query --requisites $(realpath $(which coreutils))
/nix/store/3wwfqhdym0sbis4bad1may3ll8rki8y1-gcc-12.3.0-libgcc
/nix/store/jbwb8d8l28lg9z0xzl784wyb9vlbwss6-xgcc-12.3.0-libgcc
/nix/store/s2gi8pfjszy6rq3ydx0z1vwbbskw994i-libunistring-1.1
/nix/store/k8ivghpggjrq1n49xp8sj116i4sh8lia-libidn2-2.3.4
/nix/store/aw2fw9ag10wr9pf0qk4nk5sxi0q0bn56-glibc-2.37-8
/nix/store/srrs7gn04rwa4f6zhsjkdacxydwrmzhj-attr-2.5.1
/nix/store/aj6gbshd8hvifpa1d8vy0iv688sm81wp-acl-2.3.1
/nix/store/xpxln7rqi3pq4m0xpnawhxb2gs0mn1s0-gcc-12.3.0-lib
/nix/store/yqa5m326a0ynn4whm4fikyjfljfc6i3q-gmp-with-cxx-6.3.0
/nix/store/apn3p2b40xvirn7w740wv2gy330ppib5-coreutils-9.3

These ten directories can be --ro-bind into the container and the --ro-binds can be generated with sed.

% function nix-bwrap-args() {
    nix-store --query --requisites $(realpath $(which $1)) \
    | sed 's/.*/--ro-bind & &/' \
    | tr '\n' ' '
}

However, despite having all the files needed in the container, bwrap resolves date using $PATH to /home/quag/.nix-profile/bin/date which isn't in the container.

% bwrap $(nix-bwrap-args date) date
bwrap: execvp date: No such file or directory

If we resolve the date command to the executable in the container (the coreutils binary), then that fails, as it doesn't know which command it is meant to be impersonating.

% bwrap $(nix-bwrap-args date) $(realpath $(which date))
Try '/nix/store/apn3p2b40xvirn7w740wv2gy330ppib5-coreutils-9.3/bin/coreutils --help' for more information.

If we add an --exec command that takes the executable within the container to run, and then passes through the command without resolving it, everything works. A minimal container, with executable resolution happening before entering the container while retaining the command name.

% bwrap $(nix-bwrap-args date) --exec $(realpath $(which date)) date
Wed Sep 27 05:50:49 UTC 2023

FAQ

Q1. Doesn't the running process need a valid $PATH to locate other commands it depends on?

Surprisingly no. To support installing multiple versions of the same package, all references to other packages (including executables) are compiled into the binaries as paths into /nix/store//. This means that after the program starts, $PATH is not needed again.

[edited to avoid some weird Markdown formatting —smcv]

See also #223, which was abandoned by its submitter.

While reviewing #223 I left some comments that I think are maybe a useful mental model for this:

There is no point in separating "what I am going to exec" and "what I'm going to put in argv[0]" unless you are going to set them to distinct values.

The use-case is to do things like "execute bash, but make it think it's sh".
# Common case:
# argv[] = { "/usr/bin/python3", "-c", "...", NULL }
# execve(argv[0], argv, environ)
perl -e 'system("/usr/bin/python3", "-c", "import sys; print(sys.executable)")'                                
/usr/bin/python3
# Rare case:
# argv[] = { "/pretending/not/to/be/python", "-c", "...", NULL }
# execve("/usr/bin/python3", argv, environ)
perl -e 'system {"/usr/bin/python3"} ("/pretending/not/to/be/python", "-c", "import sys; print(sys.executable)")'
/pretending/not/to/be/python

It seems to me that there would be two reasonable ways to spell the rare case. The one you seem to have chosen here (at least from the description, I haven't looked at your implementation) is:

bwrap ... --exec-filename /usr/bin/python3 /pretending/not/to/be/python -c ...

The other possibility is an equivalent of ld.so --argv0 and bash exec -a:

bwrap ... --argv0 /pretending/not/to/be/python /usr/bin/python3 -c ...

If you go with the first one, I would definitely prefer --exec-filename (as proposed before in #223) as a clearer thing to say than --exec. Looking at it without context, I would expect an --exec option in an "adverb" tool like bubblewrap to be about the decision of whether to run the "payload" command as a child process of a bubblewrap parent, or whether to use execve() to make it replace the bubblewrap process in-place.

The second one is a bit odd because it puts argv[0] and argv[1...] in different places in the command-line, but it has the advantage of being consistent with ld.so --argv0 and bash exec -a, which are features that already exist, and I think that's a desirable property.

Because bubblewrap is rewriting the filesystem namespace (and that's sort of the point) you could also do this by generating a symlink with the desired name in a tmpfs location under your control, and using that as argv[0], for example (untested, I don't use Nix):

% bwrap $(nix-bwrap-args date) \
--tmpfs /run/bwrap \
--symlink "$(realpath "$(which date)")" /run/bwrap/date \
/run/bwrap/date

But I think "ability to overwrite argv[0]" is a reasonable thing to want in an "adverb" tool like bwrap, if it isn't too much code.

containers / bubblewrap

Add --exec PATH to set the file to execute separate from argv[0] #597

Explanation

Concrete example

FAQ