Closed lelgenio closed 11 months ago
I think the issue here is that the activation script creates new files instead of replacing the contents of existing files, making a new inode, and so the file that is bind mouted inside the fhsenv no longer "exists"
$ stat /etc/group
File: /etc/group
Size: 981 Blocks: 8 IO Block: 4096 regular file
Device: 0,34 Inode: 187963323 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-08-04 12:34:38.540403715 -0300
Modify: 2023-08-04 12:34:38.540403715 -0300
Change: 2023-08-04 12:34:38.540403715 -0300
Birth: 2023-08-04 12:34:38.540403715 -0300
$ sudo /run/current-system/activate
setting up /etc...
$ stat /etc/group
File: /etc/group
Size: 981 Blocks: 8 IO Block: 4096 regular file
Device: 0,34 Inode: 187963495 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-08-04 12:45:08.785164823 -0300
Modify: 2023-08-04 12:45:08.785164823 -0300
Change: 2023-08-04 12:45:08.785164823 -0300
Birth: 2023-08-04 12:45:08.785164823 -0300
Hack 2: Disabling atomic writes during activation.
diff --git a/nixos/modules/config/update-users-groups.pl b/nixos/modules/config/update-users-groups.pl
index 54352a517a24..ead0ebbd2893 100644
--- a/nixos/modules/config/update-users-groups.pl
+++ b/nixos/modules/config/update-users-groups.pl
@@ -19,7 +19,7 @@ make_path("/var/lib/nixos", { mode => 0755 }) unless $is_dry;
sub updateFile {
my ($path, $contents, $perms) = @_;
return if $is_dry;
- write_file($path, { atomic => 1, binmode => ':utf8', perms => $perms // 0644 }, $contents) or die;
+ write_file($path, { atomic => 0, binmode => ':utf8', perms => $perms // 0644 }, $contents) or die;
}
sub nscdInvalidate {
Link that explains why this issue occurs: https://unix.stackexchange.com/questions/537095/why-does-bind-mounting-a-file-after-unlink-fail-with-enoent
Much simpler way to reproduce:
# terminal 1
$ steam-run bash
[lelgenio@monolith:~]$ grep /nixos/etc /proc/self/mountinfo
1008 909 0:32 /nixos/etc/nix /etc/nix ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1009 909 0:32 /nixos/etc/passwd /etc/passwd ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1010 909 0:32 /nixos/etc/group /etc/group ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1011 909 0:32 /nixos/etc/shadow /etc/shadow ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1012 909 0:32 /nixos/etc/resolv.conf /etc/resolv.conf ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1013 909 0:32 /nixos/etc/profiles /etc/profiles ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1014 909 0:32 /nixos/etc/sudoers /etc/sudoers ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1015 909 0:32 /nixos/etc/machine-id /etc/machine-id ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1016 909 0:32 /nixos/etc/alsa /etc/alsa ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1017 909 0:32 /nixos/etc/ssl/certs /etc/ssl/certs ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1018 909 0:32 /nixos/etc/pki /etc/pki ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
[lelgenio@monolith:~]$ steam-run bash
[lelgenio@monolith:~]$ echo two layers of bubblewrap, working fine
two layers of bubblewrap, working fine
[lelgenio@monolith:~]$ exit
exit
[lelgenio@monolith:~]$ echo one layer into bubblewrap, still fine
one layer into bubblewrap, still fine
# terminal 2
$ sudo nixos-rebuild switch --flake blah
....
setting up /etc...
reloading user units for lelgenio...
setting up tmpfiles
# back to terminal 1
[lelgenio@monolith:~]$ grep /nixos/etc /proc/self/mountinfo
1008 909 0:32 /nixos/etc/nix /etc/nix ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1009 909 0:32 /nixos/etc/passwd//deleted /etc/passwd ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1010 909 0:32 /nixos/etc/group//deleted /etc/group ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1011 909 0:32 /nixos/etc/shadow//deleted /etc/shadow ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1012 909 0:32 /nixos/etc/resolv.conf /etc/resolv.conf ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1013 909 0:32 /nixos/etc/profiles /etc/profiles ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1014 909 0:32 /nixos/etc/sudoers//deleted /etc/sudoers ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1015 909 0:32 /nixos/etc/machine-id /etc/machine-id ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1016 909 0:32 /nixos/etc/alsa /etc/alsa ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1017 909 0:32 /nixos/etc/ssl/certs /etc/ssl/certs ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
1018 909 0:32 /nixos/etc/pki /etc/pki ro,nosuid,nodev,noatime master:1 - btrfs /dev/dm-0 rw,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=1593,subvol=/nixos
[lelgenio@monolith:~]$ echo Notice how some files are marked as //deleted
Notice how some files are marked as //deleted
[lelgenio@monolith:~]$ steam-run bash
bwrap: Can't bind mount /oldroot/etc/passwd on /newroot/etc/passwd: Unable to mount source on destination: No such file or directory
I've been brainstorming potentially clean solutions to this. Here are few, some of which probably don't work because of permission issues around user namespaces I don't know about, or requirements of pressure-vessel I don't know about:
/etc
, and fill the upper tmpfs from the fhsenv./etc
directly as /etc
, then bind in extra files and directories from the fhsenv on top of that./etc
somewhere else in the container, then fill the container's /etc
with symlinks, some of which point to the mount, and some of which point to the fhsenvIn short, testing is needed. I've got other things going on, but I'll probably get to doing some before too long.
@tejing1 Looking more into the linked stack-exchange post, I'm feeling more and more that the kernel is misbehaving by not allowing the bind mount. I'll try allowing unlinked sources to be mounted, just to check what happens.
It does seem like the kernel really shouldn't be preventing this, yes.
But it would still be preferable for changes to /etc
to propagate to existing FHS environments, which they wouldn't even if the kernel allowed the bind mounts.
Just had some time to work on this, and my first 2 ideas didn't go anywhere, but mounting the host's /etc
elsewhere in the container, and symlinking things to it, seems to work great. Should have a PR soon, fixing this and a few other things.
Describe the bug
Running
nixos-rebuild switch
while steam is running makes steam(or it's fhsenv) enter a bugged state in which it can no longer start proton games.One line from steam's log stands out to me:
Steps To Reproduce
1 - Open Steam 2 - Start a proton game, works fine 3 - Run
nixos-rebuild switch
4 - Start a proton game, does not work 5 - Start a non-proton game, works fine 6 - Restart Steam 7 - Start a proton game, works fineExpected behavior
Switching generations should not affect Steam's ability to launch games.
Additional context
This issue has occurred since release 22.11, it was not fixed by 23.05, and is not fixed on unstable. Using
nixos-rebuild switch --rollback
does not fix the issue.As far as I know there is no workaround other than restarting steam every time you switch generations. I tried my hand at fixing this (throwing stuff at the wall and seeing what sticks) and came up with this hack:
stdout+stderr from running "Steps to Reproduce"
Normal log
Problematic log
Notify maintainers
As this is not a serious bug, I will not ping anyone.
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result."x86_64-linux"
Linux 6.1.38, NixOS, 23.05 (Stoat), 23.05.20230715.dirty
yes
yes
nix-env (Nix) 2.13.3
"nixos-22.11"