Discovery for Pressure-Vessel-like Subcontainers Re: Improving Steam/Snap compatibility

ZoopOTheGoop commented 8 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Is the feature related to a problem or existing issue?

Yes, most of the issues are related to this, it is primarily discussed here, but has come up at other points: https://github.com/snapcore/snapd/pull/12794#issuecomment-1539947236

Describe the solution/feature you'd like.

I would like the snapd environment and Snapcraft specification to be resilient to changes to Steam and Pressure Vessel changes on Valve, Collabora et al's end, to minimize the maintenance burden for both parties as much as is reasonably possible.

To begin with, I would like to explore the Portal/Subcontainerization idea, and what the requirements on our side would be from Steam for Linux's end if we were able and primed to do such a thing, and a rough idea of what compromises are acceptable if an "ideal" version can't be implemented (whether due to security concerns, scope issues, or whatever else). No need to go over every single change you would need, I don't want an entire detailed spec to have to be written for us, but I would like relatively clear details, particularly since I'm at best intermediate-level in containerization.

I understand a Dbus API (or equivalent) similar to Flatpak's is desired. I haven't had a chance to read and understand it in depth, though it's on my to-do list, and lack much familiarity with Flatpak, but I understand that the corresponding implementation exists in This PR by @smcv and the corresponding issue. It would help if I could get a broad description of the scope of what snapd would have to do on Steam's behalf to suitably emulate/replace/act as a Pressure Vessel (if that's the goal as I understand it). If this is a viable solution, I would be willing to help contribute Steam for Linux Runtime code to enable it as well, though I'd likely need help setting the environment up. (Only the open source bits we already have access to, obviously, I'm not asking for access to anything else).

One potential challenge I want to explicitly ask about: from my tinkering with the Runtime Tools, it's my understanding that Pressure-Vessel quite literally is in large part implemented on (a modification of) bwrap, which is also a significant part of the underlying code to Flatpak. I'm curious how this potentially affects the difficulty of implementation in snapd. How much were the changes to Flatpak simplified by them being related like this?

If (and I'm not sure if this is the case, but an educated guess) pvwrap is fundamentally basically just asking Flatpak to do what it was going to do anyway, but via Flatpak bwrap and associated overhead instead of its own/the system bwrap, then can we reasonably expect snapd to be able to provide a similar enough environment even if we undergo the effort (practical and diplomatic) to expose a similar API to Steam? What's the chance that, due to the differences in implementation, we'd just end up in exactly the same place as we are now, but with an extra Portal system on top of it to maintain?

Describe any alternatives you've considered.

https://github.com/canonical/steam-snap/issues/363#issue-2111240893

Additional context

No response

smcv commented 8 months ago

What do you mean by "discovery" here? Is this another word for requirements-capture, or do you mean that your feature request is for snapd to discover ... something? (If the latter, sorry, I don't understand what.)

I would like to explore the Portal/Subcontainerization idea, and what the requirements on our side would be from Steam for Linux's end if we were able and primed to do such a thing, and a rough idea of what compromises are acceptable if an "ideal" version can't be implemented (whether due to security concerns, scope issues, or whatever else).

The high-level goal of the Steam Linux Runtime container runtimes (pressure-vessel) is that we want to construct a hybrid environment where /usr is 90% our own runtime (for example sniper), and 10% the user's graphics drivers (Mesa or Nvidia or whatever) and their library dependencies. This lets the game make naive assumptions about /usr and the location of its dependency libraries, and have those assumptions be right. This is important because some native Linux games do things in their startup scripts that you would probably say are wrong, like ignoring and overwriting any LD_LIBRARY_PATH that they inherited from a parent process. As a result, anything that involves setting a LD_LIBRARY_PATH a mile long with components like /snap/x/lib:/snap/y/lib:..., and relying on it being used in all circumstances, is not going to work reliably.

This is not me imposing arbitrary requirements on Snap: I would much prefer it if all native Linux games were "well-behaved" and respected inherited settings like the LD_LIBRARY_PATH (even though I would also prefer not to be using LD_LIBRARY_PATH as a load-bearing component). Instead, it's something that we know from bitter experience, and it is not something that we can negotiate about, because it's a fact about pre-existing Linux games that are no longer actively maintained by their developer/publisher. We cannot go round changing all the games to stop making assumptions, even if those assumptions are wrong, because most of them are not our games. Instead, we try to meet the games where they are coming from, and make their assumptions be true.

The way this works in Flatpak-world is:

Normally, a Flatpak app runs in a new namespace that has the app's files mounted on /app, a runtime chosen by the app maintainer mounted on /usr (for example Steam currently uses org.freedesktop.Platform/x86_64/23.08), a subset of the user's home directory mounted on $HOME, and some sockets in places like $XDG_RUNTIME_DIR.

Normally, pressure-vessel would want to use bubblewrap to create new user and mount namespaces in which it has control over the mount table, then mount the /usr of our choice in that new mount namespace, voluntarily give up all of the extra privileges that it has over that namespace, and exec the "payload" (typically a game).

However, Flatpak apps are not allowed to create nested user/mount namespaces: Flatpak specifically blocks the syscalls that would allow them to do that. This is a constraint imposed by Flatpak's security model: if apps were allowed to create nested sandboxes, they would be able to trick host processes into thinking they were unconfined. This is because Flatpak intentionally does not require any specific LSM, so it cannot rely on LSM labelling (contrast with Snap, which relies on AppArmor, and if I understand correctly does not provide meaningful sandboxing on non-AppArmor-enabled kernels).

So, instead of running bubblewrap, when pressure-vessel detects that it's running under Flatpak it will send D-Bus API calls to flatpak-portal (the interface you linked) to create what Flatpak calls a "sub-sandbox". This creates new user/mount namespaces alongside the Flatpak app's current user/mount namespaces. As an implementation detail, we currently do the D-Bus calls from our own C code (in a tool called steam-runtime-launch-client), but they're the same D-Bus calls that you could make with a sufficiently new version of flatpak-spawn.

A good way to get some understanding of what is happening here would be to try it: install Steam as a native .deb and as a Flatpak app (perhaps on two different virtual machines), download and run some small free-to-play game that uses the sniper container runtime (Battle for Wesnoth, Endless Sky and Retroarch are good examples), and look at pstree and systemd-cgls to see how the process hierarchies are behaving. If you set the game's launch options to PRESSURE_VESSEL_SHELL=instead %command%, you'll get an xterm instead of the actual game, from which you can explore the container interactively.

In the native .deb version, you'll see that the container environment is a tree of processes "below" Steam.

In the Flatpak version, you'll see that instead, the container environment is a tree of processes below flatpak-portal, which Flatpak puts in a separate cgroup.

In both cases, you'll see that the /usr of the container environment is the Steam Runtime 3 'sniper' environment, which is basically Debian 11 with selected backports (Vulkan, SDL, that sort of thing). The usual compatibility symlinks /bin -> usr/bin, etc. also exist. However, you'll also see that some parts of /usr have been edited or removed, and replaced with symbolic links pointing into what we call the "graphics provider"; and you'll see that some key environment variables point into /usr/lib/pressure-vessel/overrides which, again, contains symlinks into the graphics provider.

The /etc of the container environment is a mixture: it's mostly the /usr/etc of the container environment, but with some edits done by either pressure-vessel or Flatpak (as appropriate) to substitute files like /etc/resolv.conf with the host version. In the Flatpak case, this is mostly done for us by Flatpak. Also, some of the /usr/etc has been edited by pressure-vessel to substitute symlinks to files from the graphics provider, the same as /usr.

The /app of the container environment is intentionally empty - I don't remember precisely why, but there was some subtle reason why mounting the Steam Flatpak app's normal /app there would have caused weird conflicts, so we don't.

In the native .deb version, the graphics provider is (a large subset of) the real root filesystem of the computer, which we mount on /run/host inside the container. I think this is analogous to /var/lib/snapd/hostfs in Snap.

In the Flatpak version, the graphics provider is the normal Flatpak environment that the Steam client uses (including extensions for graphics drivers), which Flatpak has mounted in /run/parent/{app,etc,usr,...}.

For more or less everything outside /app, /usr and /etc, including user data directories like /home/me and system sockets like X11 and Wayland, the rule is that if it's available in the environment where the Steam client runs, we expect Flatpak to make the same content available in the same location in the environment where the game runs.

In the Flatpak case, there are also a few things that we specifically needed to share between the two parallel container environments:

/tmp (Steam assumes that games and the Steam client can communicate via sockets or shared memory here)
/dev/shm (ditto)
$XDG_RUNTIME_DIR (ditto)
the process ID namespace (Steam assumes that games and the Steam client can tell each other process IDs and they will be in the same namespace)

So, what we would need from Snap, to do something similar to what we do in Flatpak, would go something like this:

A D-Bus API to create new sandbox environments, ideally the same shape as Flatpak's Spawn() (it can be a subset, we don't actually use 100% of it).
Flags: we need at least FLATPAK_SPAWN_FLAGS_CLEAR_ENV, FLATPAK_SPAWN_FLAGS_SHARE_PIDS and FLATPAK_SPAWN_FLAGS_EMPTY_APP.
Flags: we don't need FLATPAK_SPAWN_FLAGS_SANDBOX or FLATPAK_SPAWN_FLAGS_NO_NETWORK.
Flags: I don't remember which of the remaining flags are required.
Options: we don't need sandbox-*.
Options: we need at least unset-env and usr-fd.
We need all of the other Spawn() arguments: argv, current working directory, fd-passing, environment.
A D-Bus API to send signals to the process that was started by Spawn(), ideally the same as Flatpak's SpawnSignal().
A D-Bus API to tell us when the Spawn()'d process has terminated, ideally the same as Flatpak's SpawnExited.
Snap would need to mount the directory specified by the usr-fd as /usr, and create the usual compatibility symlinks /bin -> usr/bin, etc. in the root directory.
Snap would need to mount most of the etc subdirectory of the usr-fd as /etc. We would need to figure out what to do with files that normally come from the host, like /etc/resolv.conf - in Flatpak, this is easy because Flatpak handles it already, but in Snap, we might have to teach pressure-vessel to populate it with symlinks like /etc/resolv.conf -> /var/lib/snapd/hostfs/etc/resolv.conf or similar.
Snap would need to mount the /usr and /etc of the Steam client somewhere, so that we can have symlinks that can point into them. Flatpak uses /run/parent for this (because it has total control over /run), and that seems like as good a place as any, but it could be somewhere below /snap or /var/lib/snapd if preferred.
All other top-level directories, notably /snap, would need to be the same as they are for the Steam client itself. They can be read-only if we don't expect user code to write to them.
We would need AppArmor rules that allow reading all of the same places that the Steam client itself can read, plus /usr and /etc. Also, whatever we use as the equivalent of Flatpak's /run/parent/usr and /run/parent/etc, for each file we would normally be able to read in the Steam client's /usr and /etc, we would need to be able to read the corresponding file from the equivalent of /run/parent/usr and /run/parent/etc.

smcv commented 8 months ago

One potential challenge I want to explicitly ask about: from my tinkering with the Runtime Tools, it's my understanding that Pressure-Vessel quite literally is in large part implemented on (a modification of) bwrap

I think pressure-vessel's pv-bwrap is literally the same code as bwrap 0.8.0. If it isn't, then the differences will be very small - we try to upstream everything.

which is also a significant part of the underlying code to Flatpak. I'm curious how this potentially affects the difficulty of implementation in snapd. How much were the changes to Flatpak simplified by them being related like this?

This was mostly a matter of the design and mental model being compatible, rather than the specifics of bwrap. pressure-vessel and Flatpak do very similar things with bwrap, and in fact a lot of the code in pressure-vessel to build the bwrap command-line is directly copied from Flatpak, but Flatpak doesn't give us anywhere near that level of control over the bwrap command-line when it creates a sub-sandbox.

bwrap is quite a simple/straightforward low-level tool, because it has historically needed to be installed setuid-root on some OSs, in which case any extra convenience code would be a security risk. As a result, many of its command-line options translate relatively directly into syscalls.

One big thing that we do directly benefit from in Flatpak is that Flatpak already knows how to merge a runtime-supplied /etc with selected individual files like /etc/resolv.conf from the real host /etc, so we let it do that (and rely on the fact that it will). If Snap doesn't know how to do similarly, then either Snap or pressure-vessel will have to learn to do that. Similarly, when we're using Flatpak, we rely on Flatpak to set up services like X11, D-Bus and Wayland.

If I'm reading /proc/self/mounts correctly, Snap populates /usr with files from the Snap runtime squashfs (analogous to what Flatpak does), but it doesn't use the /etc from the Snap runtime squashfs, and instead uses the host /etc almost entirely as-is (but with AppArmor rules to lock down access to some of it, and a very small number of overrides). I don't think this is going to work reliably on all host OSs, but that's equally true for non-Steam Snap apps, so that seems out of scope here - if Snap wants to be as portable as pressure-vessel, then it will likely need to catch up with pressure-vessel in how it accounts for the fact that some host OSs are frankly bizarre, but that isn't my job!

If pvwrap is fundamentally basically just asking Flatpak to do what it was going to do anyway, but via Flatpak bwrap and associated overhead instead of its own/the system bwrap

There's less of that going on than you might think, because Flatpak doesn't give us fine-grained control over the bwrap command-line.

The best way to get a feel for this would be to try it: install Steam as a native .deb and as a Flatpak app, download and a small free-to-play game that uses the sniper container runtime (like Battle for Wesnoth), put STEAM_LINUX_RUNTIME_LOG=1 STEAM_LINUX_RUNTIME_VERBOSE=1 %command% in its launch options, run it, look at the log file SteamLinuxRuntime_sniper/var/slr-latest.log, and compare the two systems.

If you run G_MESSAGES_DEBUG=all /usr/libexec/flatpak-portal --verbose --replace first, you can also see what Flatpak is doing at the same time.

In a non-Flatpak environment, look for bwrap options before bundling and you'll see that pressure-vessel-wrap is going to finish by execve()'ing a call to pv-bwrap with a very large number of bind-mounts. We have to do all this setup for ourselves in this case, precisely because Flatpak isn't - but in the Flatpak case, we would (correctly!) not be allowed to have this level of fine-grained control.

In a Flatpak environment, look for Final command to execute and you'll see that instead, pressure-vessel-wrap finishes with a call to steam-runtime-launch-client (which is an enhanced flatpak-spawn) which is mostly environment variable manipulation and fd-passing. The pressure-vessel-specific cleverness is all encapsulated in the --usr-path, or in the pressure-vessel-adverb command that gets run inside the new container. pv-adverb is a normal, unprivileged process: it's partly there to do some final setup that would have been inconvenient to do from outside, like building a new ld.so.cache, but mostly there to hold lock-files open so that the edited /usr can't get garbage-collected while the game is still running. In the flatpak debug output, you'll see that Flatpak converts this into a very large bwrap command line that is quite similar to the one pressure-vessel would have used (because we're using a lot of the same code behind the scenes), but the precise details are not under pressure-vessel's control this time.

canonical / steam-snap