NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.38k stars 13.61k forks source link

Cannot remount /etc while rebuilding with `etc.overlay` enabled #303262

Open oluceps opened 5 months ago

oluceps commented 5 months ago

Describe the bug

Cannot remount /etc while rebuilding with etc.overlay enabled. This issue does not always appear.

May related: https://github.com/NixOS/nixpkgs/pull/270727 https://github.com/NixOS/nixpkgs/issues/291398

Steps To Reproduce

  systemd.sysusers.enable = true;
  system.etc.overlay.enable = true;
  system.etc.overlay.mutable = true;
doas nixos-rebuild switch (or with colmena apply-local --sudo)
<..snip
ktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ListUnitsByNames as 1 -- dbus-broker.service' exited with value 1 at /nix/store/0wsk0nwb6bq5f8bfxwny6bww68c44pji-nixos-system-kaambl-24.05.20240408.4cba8b5/bin/switch-to-configuration line 145.
kaambl | Activation failed: Child process exited with error code: 1
       | Failed: Child process exited with error code: 1
[ERROR] Failed to complete requested operation - Last 1 lines of logs:
[ERROR]  failure) Child process exited with error code: 1
[ERROR] Failed to deploy to kaambl - Last 20 lines of logs:
[ERROR]   stderr) Successfully installed Lanzaboote.
[ERROR]   stderr) stopping the following units: agenix-install-secrets.service, dae.service, systemd-tmpfiles-resetup.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service
[ERROR]   stderr) activating the configuration...
[ERROR]   stdout) remounting /etc...
[ERROR]   stderr) mount: /tmp/tmp.T2fXevh7N0: overlay already mounted on /etc.
[ERROR]   stderr)        dmesg(1) may have more information after failed mount system call.
[ERROR]   stderr) Moving mount
[ERROR]   stderr) Mounting beneath top mount
[ERROR]   stderr) Invalid argument | move-mount.c: 553: main: move_mount
[ERROR]   stdout) Attaching mount /tmp/tmp.T2fXevh7N0 -> /etc
[ERROR]   stdout) Moving single attached mount
[ERROR]   stdout) Activation script snippet 'etc' failed (1)
[ERROR]   stderr) Reload daemon failed: Connection reset by peer
[ERROR]   stderr) reloading user units for elen...
[ERROR]   stderr) su: Cannot determine your user name.
[ERROR]   stderr) restarting sysinit-reactivation.target
[ERROR]   stderr) Failed to restart sysinit-reactivation.target: Connection timed out
[ERROR]   stderr) See system logs and 'systemctl status sysinit-reactivation.target' for details.
[ERROR]   stderr) '/nix/store/ydkp4xlbpmvf1j5xp09rw70vy3vb5n5a-system-path/bin/busctl --json=short call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ListUnitsByNames as 1 -- dbus-broker.service' exited with value 1 at /nix/store/0wsk0nwb6bq5f8bfxwny6bww68c44pji-nixos-system-kaambl-24.05.20240408.4cba8b5/bin/switch-to-configuration line 145.
[ERROR]  failure) Child process exited with error code: 1
[ERROR] -----
[ERROR] Operation failed with error: Child process exited with error code: 1
Hint: Backtrace available - Use `RUST_BACKTRACE=1` environment variable to display a backtrace

kernel log:

Apr 11 02:01:29 kaambl kernel: erofs: (device loop1): mounted with root inode @ nid 36.
Apr 11 02:01:29 kaambl kernel: overlayfs: upperdir is in-use as upperdir/workdir of another mount, mount with '-o index=off' to override exclusive upperdir protection.

Expected behavior

/etc successfully mounted and switch complete.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Notify maintainers

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

> nix-info -m
 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.4-cachyos, NixOS, 24.05 (Uakari), 24.05.20240408.4cba8b5`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/yzkrxddg9fjcjcahb197lgrsz4i9cbhh-450afzqlzzgw6wnyc3dwysf3i5yxyqkr-source`

Add a :+1: reaction to issues you find important.

oluceps commented 5 months ago

To be mentioned that when system.etc.overlay.mutable = false; this does not appear.

r-vdp commented 4 months ago

@nikstur this happens for me also when I have anything mounted on top of /etc.

This reproduces it:

diff --git a/nixos/tests/activation/etc-overlay-mutable.nix b/nixos/tests/activation/etc-overlay-mutable.nix
index 087c06408a71..5b150c61b08f 100644
--- a/nixos/tests/activation/etc-overlay-mutable.nix
+++ b/nixos/tests/activation/etc-overlay-mutable.nix
@@ -25,6 +25,9 @@
       machine.succeed("/run/current-system/bin/switch-to-configuration test")

     with subtest("switching to a new generation"):
+      machine.succeed("mkdir /etc/mountpoint")
+      machine.succeed("mount -t tmpfs tmpfs /etc/mountpoint")
+
       machine.fail("stat /etc/newgen")
       machine.succeed("echo -n 'mutable' > /etc/mutable") 
oluceps commented 3 weeks ago

Still experiencing

kaambl | Evaluated kaambl
kaambl | Building kaambl
kaambl | /nix/store/ccjy49f1x5gvdgsb9qmi7crl2n05hisj-nixos-system-kaambl-24.11.20240729.9f10e67
kaambl | Built "/nix/store/ccjy49f1x5gvdgsb9qmi7crl2n05hisj-nixos-system-kaambl-24.11.20240729.9f10e67"
kaambl | Pushing system closure
kaambl | Pushed system closure
kaambl | No pre-activation keys to upload
kaambl | Activating system profile
kaambl | Installing Lanzaboote to "/efi"...
kaambl | Collecting garbage...
kaambl | Successfully installed Lanzaboote.
kaambl | stopping the following units: systemd-modules-load.service, systemd-tmpfiles-resetup.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service
kaambl | activating the configuration...
kaambl | remounting /etc...
kaambl | mount: /tmp/tmp.uvOanRgx7X: /dev/loop0 already mounted or mount point busy.
kaambl |        dmesg(1) may have more information after failed mount system call.
kaambl | Moving mount
kaambl | Mounting beneath top mount
kaambl | Attaching mount /tmp/tmp.ShpAHKUkAh -> /etc
kaambl | Moving single attached mount
kaambl | Activation script snippet 'etc' failed (32)
kaambl | Failed to run activate script
kaambl | reloading user units for elen...
kaambl | Error: Failed to restart nixos-activation.service
kaambl | 
kaambl | Caused by:
kaambl |     Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
kaambl | restarting sysinit-reactivation.target
kaambl | Failed to restart sysinit-reactivation.target: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
kaambl | Error: Failed to get unit dbus-broker.service
kaambl | 
kaambl | Caused by:
kaambl |     Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
oluceps commented 3 weeks ago
> nix-info -m
 - system: `"x86_64-linux"`
 - host os: `Linux 6.10.2, NixOS, 24.11 (Vicuna), 24.11.20240729.9f10e67`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Lix, like Nix) 2.91.0-dev-pre20240726-6abad7c
System type: x86_64-linux
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /home/elen/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/elen/.local/share/flatpak/exports/etc/xdg/nix/nix.conf:/var/lib/flatpak/exports/etc/xdg/nix/nix.conf:/home/elen/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/elen/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/elen/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/w1y9gd6yxf8azq4ilnk7ghcbjkcp2bbx-lix-2.91.0-dev-pre20240726-6abad7c/share`
 - nixpkgs: `/nix/store/n5yzhgbv2vrf43rjdw831xynv82by12f-rb49nm580v5dp49y1ram2byyg7pd4sj1-source`
nikstur commented 3 weeks ago

Can you provide kernel logs with dmesg for this mount call? Otherwise I cannot tell what's going on. It looks like it fails to mount the metadata image.

oluceps commented 3 weeks ago

Can you provide kernel logs with dmesg for this mount call? Otherwise I cannot tell what's going on. It looks like it fails to mount the metadata image.

I haven't seen any kernel log related to this, and it's hard to reproduce. I'll stay in this nixpkgs revision for weeks to see if I can reproduce it.

journalctl -k -p 5 --since 14:00

https://pb.nyaw.xyz/on-toucan.txt

Maybe? related https://github.com/NixOS/nixpkgs/issues/333999

oluceps commented 3 weeks ago

FWIW I set system.switch.enableNg = true; https://github.com/oluceps/nixos-config/blob/e0f6880a135dcc20c02f5452d384376a0141bf80/misc.nix#L13-L14

Mic92 commented 3 weeks ago

@nikstur I don't think there will be any logs if the kernel returns -EBUSY on the mount syscall. I think mount only prints this for filesystems that have custom error logs.

Mic92 commented 3 weeks ago

@oluceps can you run strace -f -s512 -e mount nixos-rebuild switch as root? And give us the output?

oluceps commented 3 weeks ago

@oluceps can you run strace -f -s512 -e mount nixos-rebuild switch as root? And give us the output?

Here's the log, sudo strace -f -s512 -e mount nixos-rebuild switch --flake . https://pb.nyaw.xyz/famous-squirrel.txt

Mic92 commented 3 weeks ago

Interesting. This looks different than expected. I would have expected a mount system call, but it seemed to have failed in a different syscall.