dasJ / sd-zfs

Compatibility between systemd and ZFS roots
MIT License
43 stars 13 forks source link

systemd udev doesn't wait for HDDs, system can't boot (not even recovery) #20

Open alaricljs opened 7 years ago

alaricljs commented 7 years ago

With systemd udev, sd-zfs fails to find my boot devices and attempts to go into rescue mode and fails there with an inability to mount /sysroot and systemd whining about not being able to do something with the password file. With udev and zfs hooks udev spins for a while waiting on my storage and then everything comes up clean. With systemd and sd-zfs the udev startup appears to get paralleled with other items and this breaks the process. The boot devices are SSDs, however primary storage is all HDDs set to not spin up unless told to. Unfortunately I don't have the resources to determine if this will happen without sd-zfs.

Do you know of some way to force systemd to wait on udev before proceeding? The initramfs is a very truncated version and I don't know where to start looking.

dasJ commented 7 years ago

That should not happen, as sd-zfs goes After udev:

https://github.com/dasJ/sd-zfs/blob/master/src/zfs-generator.c#L173

You can check the dependency tree by putting systemd-analyze to your initrd and generating a depenency graph.

alaricljs commented 7 years ago

Any thoughts on how to get that to work and get data out of it? zfs pool never imports or mounts, no login access is available. I don't see a way to run this against a non-running systemd setup.

dasJ commented 7 years ago

Have you tried both emergency and recovery target and did you prepend rd. to the kernel option?

On 23 Feb 2017, 02:37, at 02:37, alaricljs notifications@github.com wrote:

Any thoughts on how to get that to work and get data out of it? zfs pool never imports or mounts, no login access is available. I don't see a way to run this against a non-running systemd setup.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dasJ/sd-zfs/issues/20#issuecomment-281863174

wallzero commented 7 years ago

I appear to be having the same issue only it's also not waiting for cryptsetup.target.

dasJ commented 7 years ago

Do you use hibernation? You may have the same problem I had: https://github.com/systemd/systemd/issues/4577

wallzero commented 7 years ago

No I don't use hibernation at the moment; I only have one SSD and I didn't want to partition it. My swap sits in a VDEV and won't work with hibernation as far as I know.

wallzero commented 6 years ago

I just tried again on a fresh install and sd-zfs still seems to run immediately. Before I am even prompted for the luks password a [FAILED] Failed to mount /sysroot error is logged. systemctl status sysroot.mount shares the following:

Where: /sysroot
What: zfs:rpool/root
Docs: ...
Process: 116 ExecMount=/usr/bin/mount zfs:rpool/root /sysroot -o rw ...
dasJ commented 6 years ago

How did you configure LUKS?

wallzero commented 6 years ago

I encrypted the drive with the following:

cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha512 /dev/sda2

Then I modified /etc/default/grub and tried several things:

GRUB_CMDLINE_LINUX_DEFAULT="rd.luks.uuid=id rd.luks.name=id=luks rd.luks.crypttab=no rd.luks.options=tries=0,timeout=120s rootflags=x-systemd.mount-timeout=infinity,retry=10000,x-systemd.device-timeout=120s root=zfs:rpool/root zfs_force=1 quiet splash"

/etc/fstab doesn't have anything about the root, only the following:

# EFI
UUID=efiId /boot/efi vfat discard,umask=0077 0 0

# Boot
UUID=bootId /boot ext4 defaults,discard,nofail 0 0

# Swap
/dev/zvol/rpool/swap none swap defaults,discard 0 0

The zpool bootfs is configured:

zpool get bootfs
NAME   PROPERTY  VALUE       SOURCE
rpool  bootfs    rpool/root  local
dasJ commented 6 years ago

Correct ..... almost.

You need to add sd-encrypt to the HOOKS in /etc/mkinitcpio.conf. The cmdline LUKS options are only for the non-systemd initrd. My cmdline just looks like: root=zfs:zroot/root rw.

Now create a /etc/crypttab.initramfs. You can find the syntax in crypttab(5). Yours should probably look like this:

luks     UUID=id - tries=0,timeout=120s

Also, you can add discard to the options in the crypttab entry when you have an SSD.

Then, rebuild your initcpio. Hope this helps.

dasJ commented 6 years ago

Btw, my full HOOKS are:

HOOKS="base systemd autodetect modconf block keyboard sd-vconsole sd-encrypt sd-zfs"

I don't really need base, but it helps when I have to troubleshoot stuff in the initrd.

wallzero commented 6 years ago

Sorry I had forgotten /etc/mkinitcpio.conf:

HOOKS="base systemd autodetect modconf block keyboard keymap sd-encrypt sd-zfs filesystems fsck"

The only difference I see is sd-vconsole is missing and filesystems and fsck are added.

dasJ commented 6 years ago

Do you have a proper crypttab.initramfs?

wallzero commented 6 years ago

I tried adding your crypttab.initramfs example with my UUID but after rebuilding the initcpio and updating grub it still appears to not wait for the password prompt. Same issue as above.

Also, why is it /etc/crypttab.initramfs and not /etc/crypttab?

dasJ commented 6 years ago

Because the crypttab.initramfs is put into your initramfs, while crypttab isn't. In your fstab, can you use /dev/mapper/whatever instead of the UUIDs? systemd is probably unable to generate the proper dependencies.

wallzero commented 6 years ago

I'm sorry, do you mean for the ZFS partition? I do not have the ZFS partition UUID under a / entry in /etc/fstab. I am using the zpool bootfs option. I didn't mention above that I also set the mountpoint on the root partition:

zfs get mountpoint rpool/root
NAME PROPERTY VALUE SOURCE
rpool/root mountpoint / local

I could try zfs set mountpoint legacy?

/dev/mapper/ only contains control and luks links. I could also try /dev/mapper/luks in /etc/crypttab.initramfs?

dschaper commented 6 years ago

I had the same problem. grub-mkconfig throws in additional root directives, see /etc/grub.d/10_linux@line 66 or so. My /etc/default/grub was set with root per the documentation, however the generated /boot/grub/grub.cfg had two root= lines, one with root: as required and one with root=ZFS= which is what systemd picked up and tried to run with. Booting up and removing the first entry from the kernel line let me boot without issues.

For the record, ZFS on LUKS encrypted full disk encryption, boot is encrypted on a separate drive and no keyfiles, manual entry of passwords until I get things debugged correctly.

dasJ commented 6 years ago

@Schlesiger Sorry, I was mistaken. Have you tried the hint of @dschaper ?

wallzero commented 6 years ago

@dschaper Thank you for your input! @dasJ I will give @dschaper solution a try! I already see two root= definitions in my /boot/grub/grub.cfg.

maksim-pinguin commented 3 years ago

I have the same issue with systemd-boot. I checked my kernel parameter in the entries screen. I don't have any duplicate root=/: values. IMG_20210623_201847

That's the error during bootstrap. I also can't get a emergency shell. And journalctl is empty when I chroot in.

IMG_20210623_201507

n-st commented 2 years ago

I'm seeing the same behaviour — sd-zfs tries to import the pool before sd-encrypt has decrypted it — with this hook order:

HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block sd-encrypt lvm2 sd-zfs filesystems fsck)

I'll try different device specifications (currently PARTLABEL=…) in /etc/crypttab.initramfs when I have some more time.

@maksim-pinguin For what it's worth, you can create an unlocked root account in your initramfs (which is created separately from your regular root account) to at least get an emergency shell: https://bbs.archlinux.org/viewtopic.php?pid=1927757#p1927757 From there, you can probably just zpool import -R /sysroot rpool; exit to continue booting normally.

misaka18931 commented 2 years ago

I have the same error as @maksim-pinguin on my setup.

maksim-pinguin commented 2 years ago

I managed to get it running with the old syntax for the kernel parameter regarding the zfs partition. Check this thread: https://bbs.archlinux.org/viewtopic.php?pid=1979863#p1979863