Open hadrienk opened 6 years ago
Are there any relevant systemd messages around it? You should be able to see them from your running system with journalctl -b
I'm seeing the same issue on one system. zpool
complains about "no such pool or dataset", but it does succeed importing the pool when the zfs-import-cache
service is run from the shell after Ctrl+D. I suspect a timing problem, probably related to #25: perhaps the devices are not yet properly initialized when the import cache service is run for the first time. It's an important system, so unfortunately I can't make experiments at will, but if I have new information, I'll report it.
The logs seem to confirm my suspicions:
Sep 06 15:17:09 archlinux systemd[1]: Started udev Wait for Complete Device Initialization.
Sep 06 15:17:09 archlinux systemd[1]: Reached target System Initialization.
Sep 06 15:17:09 archlinux systemd[1]: Reached target Basic System.
Sep 06 15:17:09 archlinux systemd[1]: System is tainted: var-run-bad
Sep 06 15:17:09 archlinux systemd[1]: Starting Import ZFS pools by cache file...
Sep 06 15:17:09 archlinux kernel: spl: loading out-of-tree module taints kernel.
Sep 06 15:17:09 archlinux kernel: icp: module license 'CDDL' taints kernel.
Sep 06 15:17:09 archlinux kernel: Disabling lock debugging due to kernel taint
Sep 06 15:17:09 archlinux kernel: usb 2-12: new high-speed USB device number 2 using xhci_hcd
Sep 06 15:17:09 archlinux kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci
Sep 06 15:17:09 archlinux kernel: usb 4-1: new high-speed USB device number 2 using ehci-pci
Sep 06 15:17:09 archlinux kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata10: SATA link down (SStatus 0 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata9: SATA link down (SStatus 0 SControl 300)
Sep 06 15:17:09 archlinux kernel: ata2.00: NCQ Send/Recv Log not supported
(...snip...)
Sep 06 15:17:11 archlinux kernel: ZFS: Loaded module v0.7.0-1551_gcc99f275a, ZFS pool version 5000, ZFS filesystem version 5
Sep 06 15:17:11 archlinux kernel: random: crng init done
Sep 06 15:17:11 archlinux kernel: random: 7 urandom warning(s) missed due to ratelimiting
Sep 06 15:17:11 archlinux zpool[281]: cannot import '<redacted>': no such pool or dataset
Sep 06 15:17:11 archlinux zpool[281]: Destroy and re-create the pool from
Sep 06 15:17:11 archlinux zpool[281]: a backup source.
Sep 06 15:17:11 archlinux systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Sep 06 15:17:11 archlinux systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Sep 06 15:17:11 archlinux systemd[1]: Failed to start Import ZFS pools by cache file.
Apparently a lot of device initialization happens after udevadm settle
on this particular system.
@dasJ I can confirm being able to avoid the issue by inserting an appropriate delay before pool import. My test solution was rather crude: if the first import would fail, it would sleep 2 seconds, then try again and sleep another 4 seconds on failure before trying one last time. I'm afraid I don't know right now what would be the most elegant and efficient approach. In any case, the ability to configure a delay before the pool import—possibly via a kernel parameter—may at least be a reasonable interim solution.
I've also encountered the issue on another system, but can't tell yet what might be different about those problematic systems. The same solution with inserting a delay at least did work.
@kerberizer I realize it's been a year, but would you be willing to share the modifications you made to introduce the delay? I'm having a heck of a time booting a system with a zpool on a USB device and it appears to be entirely a timing issue.
@Klowner No problem sharing at all, but I need to recall myself what were those changes; it appears that at some point of time I've removed them. Off the top of my head I'd suggest probably editing zfs-import-cache.service (or -scan if not using zpool.cache), replacing the /usr/bin/zpool import
in ExecStart
with something like /usr/bin/sh -c "zpool import ... || sleep N && zpool import ... || sleep N ...
The point is to retry the pool import after some time if it fails, hoping that the devices would have time to settle in the meantime.
More robust solution may be unnecessary, as Arch Linux may at some point ditch initcpio, replacing it with dracut—or at least that was my impression from some emails on the arch-dev-public mailing list.
Hi, thank for sharing your work. I am trying to create a minimal initrd. I configured the hooks as follow:
I am using refind
When booting the zpool import first fails. When I type Ctrl + D it seems it tries again and starts normally. Any idea what I did wrong?