NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.11k stars 14.15k forks source link

systemd device-timeout settings are not congruent #110916

Closed nh2 closed 6 months ago

nh2 commented 3 years ago

Summary

CC @flokli who I know knows systemd quite well.

Details

I made an AWS EC2 instance with an EBS volume mounted:

{
  fileSystems."/data" = {
    autoFormat = true;
    fsType = "ext4";
    device = "/dev/xvdb";
  };
}

If I detach the volume from the VM, and run nixos-rebuild switch, it generally runs through (though taking a minute waiting for the timeout of the missing device):

# time nixos-rebuild switch --fast
building the system configuration...
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for root...
setting up tmpfiles

real  1m25.862s

If I add to the config a shorter timeout:

fileSystems."/data".options = [ "x-systemd.device-timeout=5s" ];

then ixos-rebuild fails:

# time nixos-rebuild switch --fast
updating GRUB 2 menu...
activating the configuration...
setting up /etc...
reloading user units for root...
setting up tmpfiles
reloading the following units: data.mount
data.mount is not active, cannot reload.
warning: error(s) occurred while switching to the new configuration

real  0m11.875s

If I subsequently remove the options = [ "x-systemd.device-timeout=5s" ] again, nixos-rebuild switch continues failing, in contrast to now being the same config before which did not fail.

I think this is bad: Same configs should result in same behaviour.

After rebooting the machine, the whole thing repeats: nixos-rebuild switch succeeds in presence of the timeout, and after the timeout is changed with x-systemd.device-timeout once, and then this is undone again, it continues failing until the next reboot.

Please find below the journalctl outputs of the two cases.


First situtation, where it succeeds after the timeout:

# journalctl -f
Jan 27 07:36:10 nixos[2846]: switching to system configuration /nix/store/shhlfv45q46ds606d8b1qkl6fn224iyl-nixos-system-unnamed-20.09.1632.a6a3a368dda
Jan 27 07:36:10 systemd[1]: Stopped target All Network Interfaces (deprecated).
Jan 27 07:36:10 systemd[1]: Stopped target Remote File Systems.
Jan 27 07:36:10 nscd[874]: 874 monitored file `/etc/group` was moved into place, adding watch
Jan 27 07:36:10 nscd[874]: 874 monitoring file `/etc/group` (19)
Jan 27 07:36:10 nscd[874]: 874 monitoring directory `/etc` (2)
Jan 27 07:36:10 nscd[874]: 874 monitored file `/etc/passwd` was moved into place, adding watch
Jan 27 07:36:10 nscd[874]: 874 monitoring file `/etc/passwd` (20)
Jan 27 07:36:10 nscd[874]: 874 monitoring directory `/etc` (2)
Jan 27 07:36:10 nscd[874]: 874 monitored file `/etc/hosts` was moved into place, adding watch
Jan 27 07:36:10 nscd[874]: 874 monitored file `/etc/services` was moved into place, adding watch
Jan 27 07:36:10 systemd[1]: Reloading.
Jan 27 07:36:10 lvm2-activation-generator: lvmconfig failed
Jan 27 07:36:11 su[2996]: Successful su for root by root
Jan 27 07:36:11 su[2996]: pam_unix(su:session): session opened for user root by root(uid=0)
Jan 27 07:36:11 systemd[893]: Reexecuting.
Jan 27 07:36:11 systemd[893]: Condition check resulted in Run user-specific NixOS activation being skipped.
Jan 27 07:36:11 su[2996]: pam_unix(su:session): session closed for user root
Jan 27 07:36:11 systemd[1]: Reached target Remote File Systems.
Jan 27 07:36:11 systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 27 07:36:11 systemd[1]: Reached target All Network Interfaces (deprecated).
Jan 27 07:36:11 systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
Jan 27 07:36:11 systemd[1]: Condition check resulted in File System Check on Root Device being skipped.

[hanging here until timeout]

Jan 27 07:37:30 systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start timed out.
Jan 27 07:37:30 systemd[1]: Timed out waiting for device /dev/xvdb.
Jan 27 07:37:30 systemd[1]: Dependency failed for Initialisation of Filesystem /dev/xvdb.
Jan 27 07:37:30 systemd[1]: mkfs-dev-xvdb.service: Job mkfs-dev-xvdb.service/start failed with result 'dependency'.
Jan 27 07:37:30 systemd[1]: Dependency failed for File System Check on /dev/xvdb.
Jan 27 07:37:30 systemd[1]: Dependency failed for /data.
Jan 27 07:37:30 systemd[1]: Dependency failed for Local File Systems.
Jan 27 07:37:30 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jan 27 07:37:30 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jan 27 07:37:30 systemd[1]: local-fs.target: Failed to enqueue OnFailure= job, ignoring: Unit emergency.target not found.
Jan 27 07:37:30 systemd[1]: data.mount: Job data.mount/start failed with result 'dependency'.
Jan 27 07:37:30 systemd[1]: systemd-fsck@dev-xvdb.service: Job systemd-fsck@dev-xvdb.service/start failed with result 'dependency'.
Jan 27 07:37:30 systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start failed with result 'timeout'.
Jan 27 07:37:30 systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Jan 27 07:37:30 systemd[1]: Condition check resulted in Update is Completed being skipped.
Jan 27 07:37:30 nixos[2846]: finished switching to system configuration /nix/store/shhlfv45q46ds606d8b1qkl6fn224iyl-nixos-system-unnamed-20.09.1632.a6a3a368dda

Later situtation, where it fails after the timeout:

# journalctl -f
Jan 27 07:49:36 nixos[5134]: switching to system configuration /nix/store/shhlfv45q46ds606d8b1qkl6fn224iyl-nixos-system-unnamed-20.09.1632.a6a3a368dda
Jan 27 07:49:36 systemd[1]: Stopped target Local File Systems.
Jan 27 07:49:36 systemd[1]: Stopped target All Network Interfaces (deprecated).
Jan 27 07:49:36 systemd[1]: Stopped target Remote File Systems.
Jan 27 07:49:36 nscd[874]: 874 monitored file `/etc/group` was moved into place, adding watch
Jan 27 07:49:36 nscd[874]: 874 ignored inotify event for `/etc/group` (file exists)
Jan 27 07:49:36 nscd[874]: 874 monitoring file `/etc/group` (43)
Jan 27 07:49:36 nscd[874]: 874 monitoring directory `/etc` (2)
Jan 27 07:49:36 nscd[874]: 874 monitored file `/etc/passwd` was moved into place, adding watch
Jan 27 07:49:36 nscd[874]: 874 monitoring file `/etc/passwd` (44)
Jan 27 07:49:36 nscd[874]: 874 monitoring directory `/etc` (2)
Jan 27 07:49:36 nscd[874]: 874 monitored file `/etc/hosts` was moved into place, adding watch
Jan 27 07:49:36 nscd[874]: 874 monitored file `/etc/services` was moved into place, adding watch
Jan 27 07:49:36 systemd[1]: Reloading.
Jan 27 07:49:37 lvm2-activation-generator: lvmconfig failed
Jan 27 07:49:37 su[5290]: Successful su for root by root
Jan 27 07:49:37 su[5290]: pam_unix(su:session): session opened for user root by root(uid=0)
Jan 27 07:49:37 systemd[893]: Reexecuting.
Jan 27 07:49:37 systemd[893]: Condition check resulted in Run user-specific NixOS activation being skipped.
Jan 27 07:49:37 su[5290]: pam_unix(su:session): session closed for user root
Jan 27 07:49:37 systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 27 07:49:37 systemd[1]: Reached target Remote File Systems.
Jan 27 07:49:37 systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
Jan 27 07:49:37 systemd[1]: Reached target All Network Interfaces (deprecated).
Jan 27 07:49:37 systemd[1]: Condition check resulted in Kernel Configuration File System being skipped.
Jan 27 07:49:37 systemd[1]: Condition check resulted in File System Check on Root Device being skipped.

[hanging here until timeout]

Jan 27 07:50:22 sshd[5301]: Connection from 139.59.128.39 port 53088 on 172.31.13.159 port 22 rdomain ""
Jan 27 07:50:22 sshd[5301]: Unable to negotiate with 139.59.128.39 port 53088: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 [preauth]
Jan 27 07:51:07 systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start timed out.
Jan 27 07:51:07 systemd[1]: Timed out waiting for device /dev/xvdb.
Jan 27 07:51:07 systemd[1]: Dependency failed for Initialisation of Filesystem /dev/xvdb.
Jan 27 07:51:07 systemd[1]: mkfs-dev-xvdb.service: Job mkfs-dev-xvdb.service/start failed with result 'dependency'.
Jan 27 07:51:07 systemd[1]: Dependency failed for File System Check on /dev/xvdb.
Jan 27 07:51:07 systemd[1]: Dependency failed for /data.
Jan 27 07:51:07 systemd[1]: Dependency failed for Local File Systems.
Jan 27 07:51:07 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jan 27 07:51:07 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jan 27 07:51:07 systemd[1]: local-fs.target: Failed to enqueue OnFailure= job, ignoring: Unit emergency.target not found.
Jan 27 07:51:07 systemd[1]: data.mount: Job data.mount/start failed with result 'dependency'.
Jan 27 07:51:07 systemd[1]: systemd-fsck@dev-xvdb.service: Job systemd-fsck@dev-xvdb.service/start failed with result 'dependency'.
Jan 27 07:51:07 systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start failed with result 'timeout'.
Jan 27 07:51:07 systemd[1]: Condition check resulted in Rebuild Journal Catalog being skipped.
Jan 27 07:51:07 systemd[1]: Condition check resulted in Update is Completed being skipped.
Jan 27 07:51:07 nixos[5134]: switching to system configuration /nix/store/shhlfv45q46ds606d8b1qkl6fn224iyl-nixos-system-unnamed-20.09.1632.a6a3a368dda failed (status 4)
nh2 commented 3 years ago

Side observation:

Using nofail, e.g.

    options = [
      "x-systemd.device-timeout=5s"
      "nofail"
    ];

makes nixos-rebuild switch always succed, independent of the timeout setting. This may be what you want to use if the device may be attached or not at any given time, and you don't want nixos-rebuild to fail.

Note that this doesn't simply ignore the timeout error: It also makes that nixos-rebuild switch does not wait for the operation to finish, e.g. nixos-rebuild terminates, and then 5 seconds (the configured timeout) later journalctl shows the timeout as expiring.

flokli commented 3 years ago

@nh2 on non-NixOS, how do you signal systemd you updated /etc/fstab? I wonder if changing attributes of (already mounted) mountpoints is properly supported there…

nh2 commented 3 years ago

@nh2 on non-NixOS, how do you signal systemd you updated /etc/fstab?

@flokli systemd.generator suggests

After editing /etc/fstab, the user should invoke systemctl daemon-reload. This will re-run all generators and cause systemd to reload units from disk.

flokli commented 3 years ago

Our activation script apparently does more than just invoking systemctl daemon-reload - it also traverses over the list of mounts. When mount options are changed, systemctl reload $foo.mount is also invoked: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/activation/switch-to-configuration.pl#L311

I propose tracing this a bit, and seeing if the behaviour is correct or not.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

SuperSandro2000 commented 6 months ago

I've set options = [ "x-systemd.device-timeout=5s" ] on my root disk and swap and then switched and didn't run into this. I assume this is no longer relevant, so I am going to close.