Closed Scrumplex closed 1 month ago
Perhaps related? #42117
I have patched the relevant kernel code to allow ANY application to acquire high priority queues and added it to my NixOS configuration here: https://codeberg.org/Scrumplex/flake/commit/d6dc803d5cbb4a4dfd388489873bf446f0f56e34
Feel free to use this workaround, until we find a way to allow CAP_SYS_NICE in Steam's bwrap container.
Edit: I have switched my NixOS config to use boot.kernelPatches
instead. See https://codeberg.org/Scrumplex/flake/commit/3ec4940bb61812d3f9b4341646e8042f83ae1350
Is this still a problem in NixOS 23.11?
Is this still a problem in NixOS 23.11?
Yes. This is a general issue with any binary that wants to use its capabilities but is sandboxed with bwrap
https://jvns.ca/blog/2022/06/28/some-notes-on-bubblewrap/
$ bwrap --ro-bind / / --unshare-all --uid 0 --cap-add cap_net_bind_service nc -l 80 (no output, success!!!)
it seems possible to add capabilities with bwrap, we could add a caps parameter to buildFHSUserEnv
it seems possible to add capabilities with bwrap, we could add a caps parameter to buildFHSUserEnv
While that is true, these are ambient capabilities that'll apply to all processes inside the FHS environment. In the case of Steam, this would mean that all games would have CAP_SYS_NICE
, not just SteamVR.
I also remember that Steam does not like ambient capabilities at all. But that might have been bwrap itself. Not sure
Edit: I am also pretty sure that --cap-add
itself will require root privileges. Having to run wrappers for apps like Steam with root permissions, is obviously not ideal
I may have made some progress on this. I built steam-run
with
steamPackages = recurseIntoAttrs (callPackage ../games/steam { buildFHSEnv = buildFHSEnvChroot; });
And then tried to start .local/share/Steam/steamapps/common/SteamVR/bin/vrstartup.sh
with it.
I removed the STEAM_RUNTIME
error return from vrstartup.sh, so it would run, but vrcompositor-launcher would fail to load libcap.so.2
whenever it had CAP_SYS_NICE
set (presumably due to LD_LIBRARY_PATH
being ignored).
So I did a patchelf --set-rpath /lib64 ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher
and now it seems to work without complaining about caps.
However, I still can't get the compositor to do async. I'm getting "Async support disabled by user setting"
Edit: I had to set "enableLinuxVulkanAsync" : true
in vrsettings, and now I get:
Sat Feb 10 2024 02:13:51.352798 [Info] - Enabling async support!
Sat Feb 10 2024 02:13:51.353222 [Error] - Insufficient permission to create high priority queue.
Sat Feb 10 2024 02:13:51.353240 [Error] - Failed to create VkDevice with high priority queue.
Sat Feb 10 2024 02:13:51.353258 [Error] - Disabling async support and retrying.
So, possibly no better than with bwrap.
pscap
shows
1 151086 [me] vrcompositor * sys_nice @ +
Good news first: I've made some progress on this.
To get access to certain (or all) caps, you can simply put the --cap-add $CAP
bwrap arg into steam's extraBwrapArgs.
For easily testing whether caps work, we can use capsh
:
cp `which capsh` .
sudo setcap ./capsh
steam-run ./capsh --print
If the cap is there, it'll print:
Current: cap_wake_alarm=i cap_sys_nice+ep
When I uncomment the SLR check in vrstartup.sh, I can get SteamVR to start in steam-run
without the error message:
vrcompositor-launcher.sh[662777]: exec /Volumes/Games/SteamVR/bin/linux64/vrcompositor-launcher
Using vrcompositor capability proxy
Launching /Volumes/Games/SteamVR/bin/linux64/vrcompositor
Hooray!
Bad news: This breaks steam.
On startup, you get the generic "your system does not support userns" popup and this error message in the log:
bwrap: Unexpected capabilities but not setuid, old file caps config?
Soooo how exactly are we supposed to add caps to the env if pressure-vessel errors out when it gets any access to caps?
Pinging @smcv because you might know how this is intended to work.
I celebrated to soon. In the vrcompositor.txt log it says:
Mon Mar 18 2024 10:43:17.834390 [Info] - Enabling async support!
Mon Mar 18 2024 10:43:17.834729 [Error] - Insufficient permission to create high priority queue.
Mon Mar 18 2024 10:43:17.834742 [Error] - Failed to create VkDevice with high priority queue.
Mon Mar 18 2024 10:43:17.834752 [Error] - Disabling async support and retrying.
So it appears while it has and recognises the cap inside bwrap, it doesn't actually have it from the kernel's perspective. Ugh.
So it appears while it has and recognises the cap inside bwrap, it doesn't actually have it from the kernel's perspective.
Yes. Capabilities are namespaced according to a user namespace: see user_namespaces(7)
. High-priority queues in AMDGPU require CAP_SYS_NICE
in the initial user namespace (the one where your init system ran).
bubblewrap can never give you capabilities in the initial user namespace, because each process can only ever have capabilities in the innermost user namespace that is applicable to it. The practical result is that nothing in NixOS' FHS environment will ever be able to have elevated capabilities in the initial user namespace. This is a kernel-imposed limitation, so there is nothing that user-space can do to solve it.
SteamVR developers have attempted to avoid this kernel limitation by making AMDGPU use a more user-namespace-friendly check for whether to allow high-priority queues, but unfortunately there were concerns about this opening up new denial-of-service attacks, because of how the direct rendering manager interacts with memory management.
Soooo how exactly are we supposed to add caps to the env if pressure-vessel errors out when it gets any access to caps?
You can't. This error message is essentially bubblewrap saying: it looks as though I've been installed incorrectly, and I can't tell whether continuing would be a root security vulnerability, so I'm going to stop here.
(Because bubblewrap has historically been installed setuid root, or occasionally setcap CAP_SYS_ADMIN
which is root-equivalent, it has to be extra-paranoid about whether it is about to cause a security vulnerability.)
bubblewrap can never give you capabilities in the initial user namespace, because each process can only ever have capabilities in the innermost user namespace that is applicable to it. The practical result is that nothing in NixOS' FHS environment will ever be able to have elevated capabilities in the initial user namespace. This is a kernel-imposed limitation, so there is nothing that user-space can do to solve it.
We do have access to the outside world though, so isn't there anything we could do there?
We already use elevated privileges in the "root" namespace to give the vrcompositor binary caps; is it not possible to pass this privilege through to the userns somehow?
We could trivially run a daemon that has cap_sys_nice in the root namespace too for instance.
SteamVR developers have attempted to avoid this kernel limitation by making AMDGPU use a more user-namespace-friendly check for whether to allow high-priority queues, but unfortunately there were concerns about this opening up new denial-of-service attacks, because of how the direct rendering manager interacts with memory management.
Thanks for the link!
I can understand the worry of DOS but SteamVR being able to DOS my system is not part of my threat model, so I don't see why the user shouldn't be able to declare that to be the case via a limit or some other privileged mechanism.
It's sad to need a kernel patch for an issue like this :/
This error message is essentially bubblewrap saying: it looks as though I've been installed incorrectly, and I can't tell whether continuing would be a root security vulnerability, so I'm going to stop here.
(Because bubblewrap has historically been installed setuid root, or occasionally setcap
CAP_SYS_ADMIN
which is root-equivalent, it has to be extra-paranoid about whether it is about to cause a security vulnerability.)
Would it not be possible to have a build variant with a --in-know-what-im-doing-this-is-not-a-vuln
build-time flag that disables this check? We don't ever install bwrap with suid or cap_sys_admin and I don't think SteamRT/pressure-vessel does either.
(In fact: We couldn't if we wanted to; a user would have to explicitly configure it in their system to add a wrapper and at that point, it's their own responsibility.)
We do have access to the outside world though, so isn't there anything we could do there?
At the moment, you'll see that SteamVR uses an IPC call via steam-runtime-launch-client --alongside-steam
to "escape" from the Steam Linux Runtime 3.0 (sniper) container and run vrcompositor
alongside Steam. On a more normal Linux distribution, this means it ends up running in the initial user namespace, where setcap can be effective.
It would in principle be possible to patch that so that it somehow(?) detects that it's in a nested user namespace (or detects that it's on NixOS, or something), and if yes, uses steam-runtime-launch-client --host
instead. As currently implemented, --host
requires the flatpak-session-helper
from Flatpak, although in principle it could be made to talk to a non-Flatpak-specific remote command execution service with a similar API if someone wants to provide one.
If someone successfully prototypes this by hacking the SteamVR scripts, we could ask the SteamVR developers about making that official. I do not have access to VR hardware or a NixOS system, so someone else will have to lead that.
is it not possible to pass this privilege through to the userns somehow?
No. The design of how capabilities(7)
interact with user_namespaces(7)
is that each process can only have caps in at most one userns: whichever one is the most deeply-nested. You cannot simultaneously be in a deeply-nested userns, and have caps in a "larger" userns. This is a kernel design decision which we do not get to change from user-space.
This is why /proc/$$/status
only needs a single field each for CapEff
and so on. If you think about it, if it was possible to have more complicated capabilities, then /proc/$$/status
would need one capabilities set for each level of nested userns that exists.
I have already had this discussion at exhaustive length with the SteamVR developers, and if there was a simple solution, we would be using it already.
I still think that the long-term answer to this has to be some version of "don't use capabilities(7)
", because capabilities(7)
are just not a good match for anything that needs to be able to run unprivileged.
Would it not be possible to have a build variant with a
--in-know-what-im-doing-this-is-not-a-vuln
build-time flag that disables this check? We don't ever install bwrap with suid or cap_sys_admin and I don't think SteamRT/pressure-vessel does either.
Even if you patched out that check, processes inside the bwrap sandbox will never have CAP_SYS_NICE
in the initial user namespace, because they exist in a nested user namespace (that's how bwrap can do its job); so ambient capabilities from a higher-level user namespace do not apply.
Also, because of bubblewrap's history as being optionally-setuid and therefore being trusted by sysadmins as being safe-to-be-setuid, I would not be comfortable with providing that, even as an opt-in. I have too many responsibilities already, without opening myself up to being held responsible for new root privilege escalation CVEs. If you think I'm wrong about that, you will have to ask my bubblewrap co-maintainers to overrule me and take responsibility for any CVEs that result from it. Unfortunately my bubblewrap co-maintainers seem to have mostly disappeared (they also have too many responsibilities!) so if you go that route, you are likely to be waiting a while.
From what I can tell, it's not actually namespaces that prevent capabilities from working. I am currently working on a bare-bones bubblewrap replacement for use in Nixpkgs FHSEnv wrappers and while reading bubblewrap's source code I have noticed that it mounts its sandboxed root with the MS_NOSUID
option, and it sets no_new_privs
.
I first thought it might be preferable to just strip out all the security-related code from bubblewrap, but after looking at the complexity, I have opted to write my own wrapper. You can find the current draft here: https://codeberg.org/Scrumplex/ancientwrap
It can already setup a simple sandbox and mount things inside. I am currently working on implementing the options used by buildFHSEnvBubblewrap
as well as providing a simple way to test it using Steam. I haven't tested SteamVR yet.
it mounts its sandboxed root with the
MS_NOSUID
option
This defangs setuid binaries, but even if it didn't, they wouldn't work in a user namespace, because the kernel will only allow unprivileged users to create a user namespace with one uid (your own), and all other users including root get mapped to the overflow uid (which appears inside the container as nobody
, but should be read as "not me" in this case). So setuid-root would become effectively setuid-nobody.
and it sets
no_new_privs
I'm surprised if it works without this. Last time I looked, this was a kernel requirement, without which the kernel would not allow unprivileged users to create a user namespace. (But perhaps newer kernels relax that restriction?)
From what I can tell, it's not actually namespaces that prevent capabilities from working
I would recommend reading capabilities(7)
and user_namespaces(7)
before spending a lot of time on implementing something that could turn out to be a dead end.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/unable-to-activate-gamescope-capsysnice-option/37843/9
Flatpak steam runs into the same issue.
It might be possible to have a service (on dbus?) outside the FHS env that can be asked to increase the priority of the VR compositor. SteamVR would have to integrate with it, or we could LD_PRELOAD something that hijacks the renice call and goes via the external service.
I was hoping flatpak had already implemented a solution and we could use it instead of having something custom, but they haven't, and SteamVR doesn't integrate with anything that would workaround this.
I was hoping flatpak had already implemented a solution [to executing arbitrary code on the host system with user privileges]
It has, flatpak-session-helper
, as described in https://github.com/NixOS/nixpkgs/issues/217119#issuecomment-2004146107. The client for this is the equivalent of flatpak-spawn --host
.
Obviously this is a massive sandbox hole, therefore it is not allowed for most Flatpak apps. Some especially privileged apps (mostly IDEs) do require this access, and should not be installed unless you completely trust them. Similarly, it is possible to give any Flatpak app this access, but that defeats the sandbox, so this should only be done for apps that you completely trust with access to your user account.
Steam's pressure-vessel is not designed to be a security boundary, so it technically has access to this mechanism (when not running in a Flatpak app). steam-runtime-launch-client --host
is an alternative to flatpak-spawn --host
which is available inside the Steam Runtime environment.
As I said in https://github.com/NixOS/nixpkgs/issues/217119#issuecomment-2004146107, at the moment SteamVR's scripts use steam-runtime-launch-client --alongside-steam
to launch commands that cannot work as designed inside the Steam Linux Runtime container for various technical reasons. On typical Linux distributions like Arch, Fedora and Ubuntu that ends up having the user's full privileges, but in NixOS that command ends up running inside the "FHS env", therefore its access to the host system is constrained and in particular it cannot elevate its capabilities by any means (setuid or otherwise).
As I said in https://github.com/NixOS/nixpkgs/issues/217119#issuecomment-2004146107, in principle, someone from NixOS could try hacking SteamVR's scripts so that it uses steam-runtime-launch-client --host --alongside-steam
(which means "first try --host
, then fall back to --alongside-steam
") in bin/vrsetup.sh
, or so that it somehow (?) detects that it is already running inside a NixOS "FHS env" and in that situation switches to using --host
instead of --alongside-steam
.
The intersection between the minority of gamers who use VR and the minority of gamers who run NixOS is probably quite small, so I hope you can understand why this is unlikely to be a high priority for anyone within Valve! As a result, it's unlikely that this will change unless someone from NixOS can suggest a tested change that is not too intrusive.
One way to "sandbox" Steam, without actually using a sandboxing mechanism, is to create a second user account on your computer, and log in as your normal privileged user account when you want to do non-gaming things, but log in as the other user account instead (either by logging out from your normal account, or by using "fast user switching") when you want to run Steam and play games. This is what I do myself, in fact.
That way, your normal account is protected by the same Unix permissions that have been available since the 1980s, and Steam and the proprietary games that it runs cannot access your normal account.
Obviously if you do this, you can't give the games-playing user administrative privileges (sudo
or similar) because that would defeat the purpose of using a separate account, but in the few places where Steam wants to exercise root privileges (for example to install dependency libraries on apt-based systems, or to give the SteamVR compositor CAP_SYS_NICE
), you can do the equivalent privileged operation yourself if it's consistent with your security policy.
I was hoping flatpak had already implemented a solution [to executing arbitrary code on the host system with user privileges]
I meant requesting renicing in a more restricted way, like other requests using portals that are able to be safer than arbitrary host code execution. I don't really like the way you filled in [] with something far from what I intended as if that's the only thing I could have meant :c
I meant requesting renicing in a more restricted way, like other requests using portals that are able to be safer than arbitrary host code execution.
The issue here is not niceness per-se, but the capability to re-/nice. For the mechanism in question (initialization of a high priority drm context) the kernel requires the calling process to have the CAP_SYS_NICE capability.[1] This is just something that isn't possible in user namespaces using file capabilities. A workaround would be to add it as an ambient capability to every process in the fhsenv, but that might be too permissive and requires bwrap to be a setuid binary.
A proper fix would require changes to the kernel, and several were proposed before to handle this use case without capabilities[2], though none have made it to the mainline kernel.
Yes, SteamVR does not implement any mechanism for requesting renicing, however desirable that might be. The only mechanism it has is a way to ask to execute arbitrary code "outside", because that is the only way it can possibly get one of the process parameters it wants.
My understanding is that the vrcompositor
wants two separate special process parameters:
CAP_SYS_NICE
in the init namespace (the one that is running your init system, pid 1) because that's what the AMD GPU driver requires in order for it to receive a high GPU priorityThe first of those two can be achieved by mechanisms like the Realtime portal, and if that was sufficient, there would be no problem.
But CAP_SYS_NICE
in the init namespace is literally impossible to achieve for a process that is in a different user namespace, because each process cannot hold capabilities in more than one user namespace at the same time. As soon as you enter a new user namespace like the ones created by bubblewrap, you automatically give up all capabilities in the "outer" userns. This is a kernel API limitation which neither bubblewrap nor SteamVR can bypass.
The RLIMIT_GPUPRIO
proposal linked above was one of several attempts to provide a way that SteamVR could stop needing CAP_SYS_NICE
in the initns, and therefore stop needing a way to execute arbitrary code in the initns. Unfortunately, it was not accepted by kernel developers, and I'm not aware of any suggested alternatives that would have been more likely to be accepted.
and I'm not aware of any suggested alternatives that would have been more likely to be accepted
There were several proposals from an Intel developer, as summarized here: https://lore.kernel.org/dri-devel/fa0360e4-b845-92ee-3c6d-a593cc4135b5@linux.intel.com/
There were several proposals from an Intel developer
From the reply, those didn't seem likely to be accepted either, without a DRI expert doing a lot of research first.
but it also wants CAP_SYS_NICE in the init namespace (the one that is running your init system, pid 1) because that's what the AMD GPU driver requires in order for it to receive a high GPU priority
It is slightly horrifying that the kernel side of this insists on this.
The intersection between the minority of gamers who use VR and the minority of gamers who run NixOS is probably quite small, so I hope you can understand why this is unlikely to be a high priority for anyone within Valve! As a result, it's unlikely that this will change unless someone from NixOS can suggest a tested change that is not too intrusive.
Please note that we are in the same situation as Flatpak here for which there are quite a few more people who'd like to use SteamVR. If there is a solution that works for Flatpak, we can likely make it work for our purposes too, so please focus on Flatpak first.
One way to "sandbox" Steam
We don't really care about sandboxing here. We only use bubblewrap to emulate the FHS "API" such that games are able to use our shared libraries. If we could do that without a NS, we would.
It is slightly horrifying that the kernel side of this insists on this.
From what I gathered, that's because you could DOS the system memory using a high-priority queue. Arguably, CAP_SYS_NICE isn't even enough of a permission to allow this.
The intersection between the minority of gamers who use VR and the minority of gamers who run NixOS is probably quite small, so I hope you can understand why this is unlikely to be a high priority for anyone within Valve! As a result, it's unlikely that this will change unless someone from NixOS can suggest a tested change that is not too intrusive.
Just chiming in here, we get at least one nix user a day who drops in and gets the flyswatter of this issue.
I invite you to join LVRA, we've got a large segment of users I believe you are unable to see from your vantage, and my hope is you can see what we're building out here.
I wonder: does this issue still happen when using buildFHSEnvChroot
?
I wonder: does this issue still happen when using
buildFHSEnvChroot
?
this has been explored before. See https://github.com/NixOS/nixpkgs/issues/217119#issuecomment-1936811395
Landed here from the wiki. The kernel patch that's at the end of the article mentions amdgpu, but is there any equivalent for nvidia? I tried this anyway and it didn't do anything.
While I don't know of a patch, I discussed this previously and I think the following code needs to be changed to always return true
, to bypass the capability check:
This property appears to be intended by the kernel. There is nothing we can do about that, so I'm going to close this as WONTFIX.
If someone comes up with a method to emulate an FHS environment without utilising namespaces, we can re-open this.
I don't quite understand why this issue should be closed. This is an issue about the way we package some applications in Nixpkgs and while the root cause is kernel behaviour it is still a bug in NixOS. No other distribution is affected by this issue, unless you treat Flatpak as its own distribution, of course.
There is a possibility of this being fixed in the future, so I think this issue should be kept open.
It's because there is nothing actionable for us to do. We're not even blocked on something, there is nothing on the horizon that could help alleviate this issue. Tracking issues that we cannot do anything about is not a good use of our time. Leaving issues open despite them not being actionable is also confusing.
If this were to be "fixed" on the kernel side in the future, it'd be introduced to our kernels as part of the regular flow of updates anyways, so there would still be nothing actionable for us even in that case.
As mentioned, if someone does come up with something that is actionable for us, we can always re-open and then focus on getting that path to work.
Do we have to use user namespaces or is there any other way to emulate FHS? Could something like building the directory structure and then setting up the PATHs work? I'm not too familiar with how Steam works and what all is hard-coded
The problem is that the binaries hard-code their ld.so and we can't modify the binaries. You must have a global ld.so and the only way to do that without having an actually global ld.so is a namespace.
I feel like this should be kept open with the https://github.com/NixOS/nixpkgs/labels/2.status%3A%20wait-for-upstream or https://github.com/NixOS/nixpkgs/labels/2.status%3A%20wontfix label
As I said, there is nothing to wait on here as there is nothing on the horizon which could fix this.
Wontfix is for closed issues and I think its from before GH had that feature natively.
Added the needs upstream fix label anyway. I feel like this issue should remain open, even if nothing can be done on our side. Someone might stumble across it and not file a duplicate issue, or might even have some ideas to share on how to solve this situation.
It's not a good fix but a module option (under hardware.steam?) to patch out the kernel checks for this might be acceptable, specifically aimed at the steam VR usecase rather than capabilities in FHS env in general.
If I understand correctly that would be opting into any process being able to hang your graphics (local DoS), and wouldn't allow for privilege escalation or sandbox escapes.
@LunNova See option and implementation in my config aswell as https://github.com/NixOS/nixpkgs/pull/321663 (or my port).
Describe the bug
SteamVR has the capability to use asynchronous reprojection to increase the perceived frame rate in VR applications. To achieve this, it needs to request a VkDevice with a high priority queue. AMDGPU requires applications to have the
CAP_SYS_NICE
capability [1], which is usually requested when starting SteamVR for the first time. Making sure that~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher
has the necessary capability:But SteamVR still fails to acquire a high-priority queue and disables asynchronous reprojection.
Steps To Reproduce
Steps to reproduce the behavior:
vrcompositor-launcher
hasCAP_SYS_NICE
:setcap getcap ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher
Expected behavior
SteamVR is able to acquire a high priority queue and continues to use async reprojection.
Logs
vrcompositor.log:
Additional context
My hardware supports this feature, as I have been using SteamVR with async reprojection on Arch Linux before.
Notify maintainers
@mkg20001 @jagajaga
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.