Tracking issue for Steam for Linux and Snap environment/snapd compatability improvement pathways

ZoopOTheGoop commented 8 months ago

This is a tracking issue for two (potentially more) related tickets that aim to reduce the burden on both the maintainers of the Snap environment that encloses Steam and the Steam for Linux developers and contributors.

Primary context: https://github.com/snapcore/snapd/pull/12794#issuecomment-1539947236 (but similar ideas have been mentioned elsewhere). See also, context

Currently, this is in discovery, meaning no solution has been chosen, but I'm merely trying to understand the limitations, benefits, and implentation difficulties of possible solutions:

NOTE: 99% of changes are very likely to ultimately be snapd changes. Currently all related issues, even ones that deal more or less exclusively with snapd, are located here on the Steam Snap issue tracker. There are two reasons for this:

snapd's Github only has Pull Requests, not Issues. All snapd feature requests and bug reports are encouraged to either go on the associated Launchpad repo or Discourse forum. It's easier to keep this on Github for visibility, particularly so the Steam for Linux maintainers are more likely to be aware of it (and this is an explicit but kind request on my end for more specific input/feedback :) ).
Any change we would make is essentially just for the sake of the Steam Snap. While other theoretical Snaps may utilize it (Lutris, maybe?), it's almost entirely a Steam Snap issue and this is the tracking page for said application. (Any proposal may even have to be implemented as a permission-needed plug exclusively available to the Snap much like the current interface).

Context:

I would like to explore both mentioned solutions in the linked issue to their fullest potential, as well as any alternatives that come up during discovery. We seem to currently (and slowly) be taking path 1, but this leads to an ever-increasing scope of broadened permissions. This in and of itself is not bad, but is unsustainable in the sense that the Steam Runtime for Linux's maintainers cannot and should not have to memorize every detail of snapd's quirks or app armor permissions (see, for instance, the issue that happened in in this commit having to roll back some compatibility work done for FEX-Emu).

What should be internal implementation details should be allowed to change naturally, and while obviously it's always going to be a battle to meet every quirk of every distro and containerization system, this needs to be easier to maintain. With our team size, it's an uphill battle to properly and rapidly diagnose errors and determine what's the AppArmor profile, what's a deeper snapd quirk (mount order etc), what's Steam, and what implementation detail has changed, particularly when the Steam Snap is not our only project or maintenance burden. Additional automated testing is helpful, but doesn't address the core issue.

I'd also like to acknowledge that confusion between the Snap and Native versions can be very easy (even as a developer, sometimes I'm not 100% sure which of the two I'm in for debugging purposes without checking), so often we don't even get reports something is wrong and they end up on the Main Steam for Linux Issue Tracker, where it's often not even clear they're Snap issues until some interactive debugging is done. This is frustrating for both parties because we don't get the feedback on what we need to fix in a timely manner, and the maintainers of Steam for Linux have to field bugs meant for the Canonical gaming team on a project they are not directly responsible for nor actively maintain (though particularly @smcv has created issues, done debugging, and given code review, despite this, which I'm very thankful for).

While there are possible solutions to better disambiguate the two environments, or ensure reports go to the right place more easily, they're largely out of the scope of this thread (and probably would require people unlikely to be directly involved in a technical issue tracker). And, like testing, ultimately don't tackle the root issue. Instead, this issue is to fully explore and understand the requirements from Valve, Collabora, etc's perspective, with feedback from us, and ideally a workable solution we can implement.

I don't expect the Steam for Linux maintainers to shoulder the entire burden of determining the entire API surface, requirements, or the scope of changes, but it would be helpful to fully understand exactly what can be done to minimize further quirks/bugs/etc on both our ends to a reasonable level - even if a change may be somewhat ambitious for us to implement. With any potential solution, we may need to request assistance in determining certain implementation details, due to a current limits imposed by our team size and a lack of certain types of expertise on our team, as well as not knowing many specific internal details about how Steam works.

I should be clear that nothing here is to be taken as a guarantee of any specific major change (or one at all), especially since I can't predict how resources will be allocated or security will react to various solutions. I'm not posting this as the result of any meeting or internal decision, I'm writing this, unprompted, as an individual engineer that happens to be working on the project, while walking the line of acknowledging that any sufficient solution would likely have to involve a group effort from our whole team. But I do promise that I will at the very least write a prototype or formal internal proposal attempting to resolve any needed change, and will endeavor to revise said proposal or prototype while there's hope of a good solution (even if I have to sink personal time into it due to conflicts with assigned on-the-clock projects). I have previously casually advocated for some solutions before, but I've never made a detailed, concrete proposal or prototype, and I'd like to see what that may accomplish.

smcv commented 8 months ago

The high-level problem here is the same as the high-level problem with the unofficial Steam Flatpak app.

One of the things that Flatpak and Snap do is sandboxing: taking user programs out of the trusted computing base. Normally, a Flatpak or Snap app has some sort of defined scope: it might need to open certain documents, it might need to render 3D graphics, it might need to talk to specific hardware, and so on - but what it does is relatively well-understood and finite, so you can choose sandboxing parameters that allow everything it wants to do, while disallowing things that it would only do if it had become malicious or compromised.

However, Steam is not really an app, in the sense that Flatpak and Snap would normally understand the term: it's more like an app framework of its own, alongside Flatpak and Snap. Its purpose is to download and run completely arbitrary software (mostly but not exclusively games), which means that if you are running Steam in a sandbox, the permissions that it needs are the union of all of the permissions that would be required by any of the games and tools that it can install.

The other thing that Flatpak and Snap do is to provide predictable compatibility: taking a program that was compiled for one environment (perhaps Ubuntu), and running it on a machine whose OS provides a different environment (perhaps Arch) without breaking its expectations, by providing a non-host-OS runtime library stack that fits its expectations. Normally, a Flatpak or Snap app can be assumed to target exactly one environment: maybe it's the freedesktop.org Platform v23.08 as used with Flatpak, maybe it's an Ubuntu subset as used with Snap, or maybe it's Debian, but you can usually assume a single app wouldn't have one component that was compiled for fdo Platform 23.08 while also having a second component that was compiled for Debian, because in practice there would be no host OS where both of those components would work reliably.

However, again, Steam isn't an app, it's an app framework. The Steam client itself has a relatively complicated multi-process architecture (for historical and bootstrapping reasons), and individual games can require different, not-fully-compatible runtime environments (which we provide by using pressure-vessel). There is no single environment that Flatpak or Snap can point to and say "this is the environment Steam runs in", because it just isn't that simple. So Steam as a sandboxed app needs to be able to launch a game or tool in a different environment on request, either by creating a nested container inside the "larger" app framework's sandbox (this is how it currently works in Snap world), or by asking the "larger" app framework to create and run a new sandbox with a different /usr alongside the one that is running the Steam client (this is how it currently works in Flatpak world).

ZoopOTheGoop commented 4 months ago

@smcv Sorry for the delay, but we had a meeting I lined up to discuss this at a company Sprint with all of the teams affected by these decisions that I wanted to wait for before promising anything, and we've elected for the following solution: we are switching Steam to a default-allow model on AppArmor. For now, Steam will have effectively infinite permissions to do what it wants, we might tighten this security on a case-by-case basis, especially as snapd gets more features that may allow for more fine-grained control (such as restricting games, or prompting for certain types of access, maybe even a subportal model someday), but for now you should be able to remove any extant Snap support hacks after that PR lands and gets into stable snapd.

We're considering how to alert users to the less strict confinement, but that's out issue, not yours.

I'll ping you when the change happens.

canonical / steam-snap

Tracking issue for Steam for Linux and Snap environment/snapd compatability improvement pathways #363

Context: