Open lopsided98 opened 1 year ago
Can confirm that it's broken with zfs native encryption too
So this is a tricky problem. For non-ZFS, systemd is supposed to handle this. I've simulated the systemd-cryptsetup-generator
and the systemd-fstab-generator
and look at this:
$ cat etc/fstab
/dev/mapper/virt /foo ext4 defaults 0 0
$ cat etc/crypttab
virt LABEL=phys
$ unshare -U -r -m bash -c 'mount --bind /nix/store ./nix/store; mount --bind /proc ./proc; chroot . /run/current-system/systemd/lib/systemd/system-generators/systemd-fstab-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late'
$ unshare -U -r -m bash -c 'mount --bind /nix/store ./nix/store; mount --bind /proc ./proc; chroot . /run/current-system/systemd/lib/systemd/system-generators/systemd-cryptsetup-generator /run/systemd/generator /run/systemd/generator.early /run/systemd/generator.late'
$ find run/systemd/
run/systemd
run/systemd/generator
run/systemd/generator/cryptsetup.target.requires
run/systemd/generator/cryptsetup.target.requires/systemd-cryptsetup@virt.service
run/systemd/generator/dev-mapper-virt.device.d
run/systemd/generator/dev-mapper-virt.device.d/40-device-timeout.conf
run/systemd/generator/systemd-cryptsetup@virt.service
run/systemd/generator/local-fs.target.requires
run/systemd/generator/local-fs.target.requires/foo.mount
run/systemd/generator/foo.mount
run/systemd/generator/local-fs.target.wants
run/systemd/generator/local-fs.target.wants/systemd-remount-fs.service
run/systemd/generator/dev-mapper-virt.device.requires
run/systemd/generator/dev-mapper-virt.device.requires/systemd-cryptsetup@virt.service
run/systemd/generator.late
run/systemd/generator.early
$ cat run/systemd/generator/dev-mapper-virt.device.d/40-device-timeout.conf
# Automatically generated by systemd-cryptsetup-generator
[Unit]
JobTimeoutSec=infinity
So what's happening here is that the file system foo.mount
won't be started until dev-mapper-virt.device
appears, but because of dev-mapper-virt.device.d/40-device-timeout.conf
, that device will never timeout. The device timeout works correctly because systemd-cryptsetup@virt.service
requires and is ordered after dev-disk-by\x2dlabel-phys.device
, so the timeout on dev-disk-by\x2dlabel-phys.device
will cause cascading failures in the event that the physical device fails to show up.
Now, for those having this problem in the non-ZFS case: This means you should set your file system device to /dev/mapper/foo
, not /dev/disk/by-whatever/whatever
. This will ensure that it only times out when the actual physical device fails to appear, not when you fail to enter the passphrase in time. By depending on anything other than /dev/mapper/foo
, you're failing to get the timeout override that makes this all work.
As for ZFS, the problem is analogous. The import service is not depending on the device that it needs to import, so it's starting too early and assuming the device has failed to appear. In an ideal world, ZFS would have udev rules that makes a device indicative of the pool name only appear once the pool's drives are all available, so that we could order the import service after said device. For now, the best alternative is probably to order the import service after cryptsetup.target
, but this is not without caveats. For instance, what about users who have crypttab
devices stored on ZFS zvols? The import service actually needs to come before cryptsetup.target
in that case. Not sure what exactly to do here.
Thank you for looking into this; I can confirm that this fixes the non-ZFS case. Quickly looking at the code, it seems like it would be relatively simple for systemd to apply the drop in to the by-uuid
path as well, but then I guess you could still break it by using one of the the by-id
paths or /dev/dm-0
.
I've also implemented https://github.com/lopsided98/nixos-config/blob/master/machines/HP-Z420/default.nix#L121-L134 which I think we should be able to auto generate. If it is not generic solvable, then at least behind an option which may be by default on.
Also we should treat this issue with a bit of priority because I ended up several times in emergency mode when the unlocking failed even after already entering the normal system.
@SuperSandro2000 Don't create custom .device
units like that. Just order the zfs service against the mapper names, as I described above, instead of /dev/disk/by-uuid
names. The LUKS logic is already taking care of finding disks by UUID if that's what you care about.
There isn't an option that could be turned on by default, because the disks required differ from system to system. The only common thing that could be done by default is ordering after cryptsetup.target
, but as I said before, this breaks other setups that have LUKS devices on the zpool.
You mean like?
boot.initrd.systemd.services."zfs-import-root" = let
zfsPools = [
"dev-mapper-machine\\x2deins.device"
"dev-mapper-machine\\x2dzwei.device"
];
in {
wants = zfsPools;
after = zfsPools;
};
Yep. That way you don't need the custom timeout in the units = ...
stuff.
Now, for those having this problem in the non-ZFS case: This means you should set your file system device to
/dev/mapper/foo
, not/dev/disk/by-whatever/whatever
. This will ensure that it only times out when the actual physical device fails to appear, not when you fail to enter the passphrase in time. By depending on anything other than/dev/mapper/foo
, you're failing to get the timeout override that makes this all work.
Thank you very much for looking into this issue. However, even with your recommended fix the password prompt times out after 90 seconds (the result being the emergency mode). It seems that mounting the /boot
partition is the problem as it is the only remaining file system device still being supplied as /dev/disk/by-uuid/...
. (Also, it fails to mount when NixOS tries to keep on booting after decryption after the timeout).
Then again, /boot
is not encrypted in my specific case, my setup is rather simple with an unencrypted boot-partition (vfat) + LVM-over-Luks for the remainder (which are supplied via /dev/mapper/...
).
Do you have any pointers/ideas on how to further debug this or even a solution at hand?
@preisi Sorry for taking a while to get back to you.
It seems that mounting the /boot partition is the problem
The /boot
partition is not something stage 1 handles. If that's causing delays, it's during stage 2, and it's a separate issue.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/unlocking-multiple-luks-devices-with-same-passphrase/45856/4
Describe the bug
With systemd initrd and an encrypted rootfs, the system enters emergency mode if the decryption password is not entered within 90 seconds. This occurs because systemd device units time out by default after 90 seconds. Additionally, ZFS on LUKS (I haven't tested native ZFS encryption) times out after 60 seconds because of a hardcoded timeout in
zfs-import-${pool}.service
.To work around the first issue, I added
x-systemd.device-timeout=0
to the root filesystem options. To fix ZFS, I added the decrypted rootfs device unit as a dependency ofzfs-import-root.service
, so the 60 second timeout doesn't start until the device is decrypted. I also setJobTimeoutSec=infinity
on the device unit (see here).I don't see an obvious way to turn these workarounds into something that can be automatically configured in nixpkgs, but I think we should find some solution, because the current behavior is unexpected and annoying.
Notify maintainers
@ElvishJerricco