Closed maleadt closed 1 year ago
Base: 79.71% // Head: 79.71% // No change to project coverage :thumbsup:
Coverage data is based on head (
975c302
) compared to base (da1ccc1
). Patch has no changes to coverable lines.:exclamation: Current head 975c302 differs from pull request most recent head a3d1210. Consider uploading reports for the commit a3d1210 to get more accurate results
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Fixed an issue: maps now need to be mounted before procfs or we aren't using the tmpfs meant for overlays.
The remaining CI failure is the test that read-only mounts are actually read-only (from the sandbox POV), which is of course what this PR is proposing to change.
Scratch that, I've redesigned the feature after some thinking, introducing an --overlay HOST_PATH:SANDBOX_PATH
to be used in combination with read-only maps. This keeps everything working as it currently does, while introducing (UserNSExecutor-only) overlay functionality to make read-only maps mutable in a controlled manner (i.e. without having to peek into the persistence_dir
):
using Sandbox
overlay = mktempdir()
config = SandboxConfig(
Dict("/" => Sandbox.debian_rootfs(),
"/root/.julia/packages" => "/home/tim/.julia/packages");
overlay_maps = Dict("/root/.julia/packages" => overlay),
stdin, stdout, stderr, persist=true, verbose=true,
)
with_executor(UnprivilegedUserNamespacesExecutor) do exe
run(exe, config, ignorestatus(`/bin/bash -c "touch /root/.julia/packages/foo"`))
end
run(`find $overlay`)
The only annoying part is that this requires --persist
, i.e., it doesn't work with the ephemeral one mounted at /proc
because mount_overlays
runs after procfs
has been mounted again. We also cannot reorder that code, because it needs to run after mount_maps
(since overlays are put on top of other mounts), yet mount_maps
requires procfs
to look into mtab
.
OK, made this work with persist=false
too. This should be good to go, or good to review at least.
@staticfloat Could you explain why you were mounting the overlayfs-backing tmpfs on /proc
? That breaks lots of things (and is why you quickly restore it), but I don't see why we're doing this in the first place. Many other directories are guaranteed to exist as per FSH, and this is outside of the sandbox' root we'll pivot into anyway, so should be invisible wherever we mount it. In this PR, I've changed it to mount at /root
, which seems to work fine.
FWIW, I actually started out with something simple that made read-only mounts use overlays, https://github.com/staticfloat/Sandbox.jl/commit/3e534e139f25e38f5245c4c22c574ec3eeb04c38, but because it would then behave differently from the Docker executor I went with a separate --overlays
option.
Okay, I've been thinking about this, and while I like it, I'm not happy with the API. With this, we have the following:
1) A way to create read-write mappings (--workspace
)
2) A way to create read-only mappings (--map
)
3) By default, an overlay that allows modifications to the rootfs (but not to read-only mounts), which can be persisted or not (--persist
)
4) A way to selectively apply an overlay to paths, granting similar functionality to --persist
but to the read-only mounts (--overlay
).
IMO, this is too complicated. In terms of realistic needs, I think we only actually care about the following things:
1) I need to be able to modify files in the host through read-write mappings. 2) I want to be able to do bad things in the sandbox without messing up anything in my outer system. 3) I want to be able to persist changes.
From this perspective, I think a better API would be to have the following pieces of functionality:
1) --map
and --workspace
remain. (perhaps with a unified API, --map host:guest
and --map host:guest:ro
, if we wanted to copy docker)
2) We apply overlays to all read-only mounts (including the rootfs) by default. We store them all in the persistence directory so --persist
works with all of them at once.
What we lose with this is the ability to get read-only file system errors, which I think is fine. Regarding docker compatibility, I think we can get this working in docker as well by generating an entrypoint bash script that runs the moral equivalent of:
for mount in $ro_mount_list; do
workdir=$persist_dir/$(hash $mount)
mkdir -p $workdir
mount -t overlay overlay -o lowerdir=$mount,upperdir=$workdir/upper,workdir=$workdir/work $mount
done
umount $persist_dir
Since we usually run with --privileged
, that should work.
OK, I implemented (the core part) of that API. One problem that surfaced is that overlayfs doesn't support mounting a single file, breaking e.g. the internet tests here (which mount /etc/resolv.conf
). Do we want to support that? I guess we could detect the file case, mount the parent dir in a temporary location and bind-mount the file, but that feels hacky.
EDIT: pushed that hack. Makes mount
looks funny, but it works:
Child Process PID is 1914942
--> Mapping 1000:1000 to 0:0 within container namespace
--> Creating overlay workdir at /root
...
--> Mounting overlay of /tmp at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf (modifications in /root/upper/etc/resolv.conf, workspace in /root/work/etc/resolv.conf)
--> Bind-mounting /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf/resolv.conf over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/etc/resolv.conf (read-write)
...
About to run `/bin/bash` `-c` `mount`
overlay on / type overlay (rw,relatime,lowerdir=/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f,upperdir=/root/upper/rootfs,workdir=/root/work/rootfs,index=off,xino=off,metacopy=off)
overlay on /proc/etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)
overlay on /etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)
Pushed a commit that does the entrypoint hack as suggested by @staticfloat. I'm not particularly happy with that, as it only works when we have appropriate privileges, but on the other hand with --overlay
we didn't support any of this using the Docker executor so I guess it's an improvement of some sort.
EDIT:
[ 387.893369] overlayfs: filesystem on '/var/overlay//read_only/upper' not supported as upperdir
Gah. What if we used a host dir, and implement persistence just like how it's done with the UserNS executor?
Well, that works, except for the rootfs. Guess Linux doesn't like double overlayfs mounts? Using two separate persistence mechanisms works around this, but is not nice.
I'm not happy with how this turned out; the Docker entrypoint solution (in order to preserve compatibility with the UserNS sandbox) too way too messy: using two persistence mechanisms, requiring privileged containers AND overlayfs support, and likely to break when using stripped container images (say, not containing a /bin/sh
).
So instead I've opted to switch PkgEval over to using OCI container engines directly, https://github.com/JuliaCI/PkgEval.jl/pull/177. That gives me the control I need (pretty easy, too; instead of generating a SandboxConfig I just need to build slightly more verbose JSON), as well as many other features I would have to add to the UserNS sandbox here (notably cgroup resource limits).
So I'll go ahead and close this. FWIW, I think it would make sense to remove the UserNS sandbox here and build Sandbox.jl on top of an OCI container runtime; the code from https://github.com/JuliaCI/PkgEval.jl/pull/177 could help with that. It supports almost all features of Sandbox.jl, with the exception of whole-root persistence (mounting an overlay at /
seems unsupported by both crun
and runc
).
Adds overlay maps to the usernamespace sandbox so that we can create overlay mounts (e.g. to make read-only mounts mutable) in a controlled manner (i.e. without having to peek into the executor's
persistence_dir
):Intended use case: PkgEval, where we need a way to mount the package store. That mount needs to be read-only, since concurrent access results in corruption, yet we need a way to catch changes and add them to the cache afterwards. We cannot use layered depots for this because compilecache files hard-code the path to the Julia sources (see https://github.com/JuliaCI/PkgEval.jl/issues/158).
It would be cleaner if the overlay dir passed to the container would only be used as the upper dir, so that we can directly use the files in there, and not have to worry about removing the
work
directory (which has bad permissions that trigger a Julia bug, and may even be root-owned in the case of the PrivilegedUserNamespaceExecutor), but sadly the workdir has to live on the same filesystem as the upper dir.