Open tigerblue77 opened 1 year ago
Oooh that does look awful, I'm sorry, I didn't realize before replying that it was a github issue and not an email thread. Important parts of it below:
It looks to me like lvm might be starting the second VDO only after the first, not in parallel, and the first takes long enough to time out the second. If you look at the log messages, the first VDO took 1:20, and systemd decided the service had failed at 1:25.
You might consider upping the timeout that systemd is using to mark it
as failed: create /etc/systemd/system/lvm2-activation.service.d/10-timeout.conf
and make
it contain
|[Service]
TimeoutSec=4m
in order to bump the timeout to 4m. (I think this is the right unit, but you might need to also do this for lvm2-activation-early.service and lvm2-activation-net.service.)
Hopefully this works for you!
Hello @sweettea, Thanks for reformatting your message ! 😁
Hello @sweettea, I understood your suggestion to increase the LVM timeout but I'm not a big fan as I want my boot to be as fast as possible and so, to parrallelise all that is possible. Following your answer, here are the questions that came to my mind :
ZEROED-VDO-LV
is 15,3 TB and my COMPRESSED-DEDUPLICATED-VDO-LV
is 1,2 TB. Maybe I should reorder the fstab entries so that /mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1
, which have bind mount dependencies, be mounted first ? (if LVM parrallelisation is not possible)fstab
OK for you ?
You shouldn't see anything exotic in my configuration :/
, /mnt/ZEROED-VDO-LV-1
and /mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1
)/mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1
volume (as the creation of multiple VDOdataLVs on the same VDO pool is not possible at this time)systemd
dependency tree OK to you ? (I didn't created entries manually)
Here is the partial output of systemctl list-dependencies
command (added <===
to lines that should matter IMO) :
[...]
● ├─smbd.service
● ├─ssh.service
● ├─systemd-ask-password-wall.path
● ├─systemd-logind.service
● ├─systemd-update-utmp-runlevel.service
● ├─systemd-user-sessions.service
● ├─basic.target
● │ ├─-.mount <===
● │ ├─tmp.mount <===
● │ ├─var.mount <===
● │ ├─paths.target
● │ ├─slices.target
● │ │ ├─-.slice
● │ │ └─system.slice
● │ ├─sockets.target
● │ │ ├─dbus.socket
● │ │ ├─dm-event.socket
● │ │ ├─docker.socket
● │ │ ├─systemd-initctl.socket
● │ │ ├─systemd-journald-audit.socket
● │ │ ├─systemd-journald-dev-log.socket
● │ │ ├─systemd-journald.socket
● │ │ ├─systemd-udevd-control.socket
● │ │ └─systemd-udevd-kernel.socket
● │ ├─sysinit.target
● │ │ ├─apparmor.service
● │ │ ├─blk-availability.service
● │ │ ├─dev-hugepages.mount
● │ │ ├─dev-mqueue.mount
● │ │ ├─keyboard-setup.service
● │ │ ├─kmod-static-nodes.service
● │ │ ├─lvm2-lvmpolld.socket
● │ │ ├─lvm2-monitor.service
● │ │ ├─proc-sys-fs-binfmt_misc.automount
● │ │ ├─sys-fs-fuse-connections.mount
● │ │ ├─sys-kernel-config.mount
● │ │ ├─sys-kernel-debug.mount
● │ │ ├─sys-kernel-tracing.mount
● │ │ ├─systemd-ask-password-console.path
● │ │ ├─systemd-binfmt.service
● │ │ ├─systemd-boot-system-token.service
● │ │ ├─systemd-hwdb-update.service
● │ │ ├─systemd-journal-flush.service
● │ │ ├─systemd-journald.service
● │ │ ├─systemd-machine-id-commit.service
● │ │ ├─systemd-modules-load.service
● │ │ ├─systemd-pstore.service
● │ │ ├─systemd-random-seed.service
● │ │ ├─systemd-sysctl.service
● │ │ ├─systemd-sysusers.service
● │ │ ├─systemd-timesyncd.service
● │ │ ├─systemd-tmpfiles-setup-dev.service
● │ │ ├─systemd-tmpfiles-setup.service
● │ │ ├─systemd-udev-trigger.service
● │ │ ├─systemd-udevd.service
● │ │ ├─systemd-update-utmp.service
● │ │ ├─cryptsetup.target
● │ │ ├─local-fs.target <===
● │ │ │ ├─-.mount <===
● │ │ │ ├─boot-efi.mount
● │ │ │ ├─boot.mount
● │ │ │ ├─home.automount <===
● │ │ │ ├─mnt-COMPRESSED\x2dDEDUPLICATED\x2dVDO\x2dLV\x2d1.mount <===
● │ │ │ ├─mnt-ZEROED\x2dVDO\x2dLV\x2d1.mount <===
● │ │ │ ├─systemd-fsck-root.service
● │ │ │ ├─systemd-remount-fs.service
● │ │ │ ├─tmp.automount <===
● │ │ │ └─var.automount <===
● │ │ └─swap.target
● │ │ └─dev-mapper-Ultron\x2d\x2dvg\x2dswap_1.swap
[...]
I suppose that the 3 .automount
entries correspond to the 3 fstab
bind mounts. But why do I have tmp.mount
and var.mount
entries ? And no home.mount
entry ? Seems strange to me. Maybe it's left from initial installation ? (before creation/addition of VDO volumes)
I understood your suggestion to increase the LVM timeout but I'm not a big fan as I want my boot to be as fast as possible and so, to parrallelise all that is possible. Following your answer, here are the questions that came to my mind :
* Why is my LVM not working in parrallel ? Is it possible to change this setting ?
I believe there's an lvm.conf setting, global/event_activation, which if set to 1 will use event-based startup and might do so in parallel. I'm not sure how this works with VDO, having not worked on VDO for a year, but it's possibly something to try. Also, I don't use Debian and don't know for sure if their version of lvm supports event_activation.
* Why did it stop working from one day to another ? (I know you have no answer of course)
Different recovery time -- could have been caused any number of ways, perhaps there was more disk traffic or memory pressure this time, or maybe this one had to do more recovery, or ...
* Can the size of volumes be the problem ? My `ZEROED-VDO-LV` is 15,3 TB and my `COMPRESSED-DEDUPLICATED-VDO-LV` is 1,2 TB. Maybe I should reorder the fstab entries so that `/mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1`, which have bind mount dependencies, be mounted first ? (if LVM parrallelisation is not possible)
LVM startup order is independent of fstab order.
* Maybe I do have some mount dependencies errors/incoherence ? * Is my `fstab` **OK** for you ? You shouldn't see anything exotic in my configuration : * 3 mounts (`/`, `/mnt/ZEROED-VDO-LV-1` and `/mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1`) * 3 bind mounts on the `/mnt/COMPRESSED-DEDUPLICATED-VDO-LV-1` volume (as the creation of multiple VDOdataLVs on the same VDO pool is not possible at this time) * Is my `systemd` dependency tree **OK** to you ? (I didn't created entries manually) Here is the partial output of `systemctl list-dependencies` command (added `<===` to lines that should matter IMO) : ``` [...] ● ├─smbd.service ● ├─ssh.service ● ├─systemd-ask-password-wall.path ● ├─systemd-logind.service ● ├─systemd-update-utmp-runlevel.service ● ├─systemd-user-sessions.service ● ├─basic.target ● │ ├─-.mount <=== ● │ ├─tmp.mount <=== ● │ ├─var.mount <=== ● │ ├─paths.target ● │ ├─slices.target ● │ │ ├─-.slice ● │ │ └─system.slice ● │ ├─sockets.target ● │ │ ├─dbus.socket ● │ │ ├─dm-event.socket ● │ │ ├─docker.socket ● │ │ ├─systemd-initctl.socket ● │ │ ├─systemd-journald-audit.socket ● │ │ ├─systemd-journald-dev-log.socket ● │ │ ├─systemd-journald.socket ● │ │ ├─systemd-udevd-control.socket ● │ │ └─systemd-udevd-kernel.socket ● │ ├─sysinit.target ● │ │ ├─apparmor.service ● │ │ ├─blk-availability.service ● │ │ ├─dev-hugepages.mount ● │ │ ├─dev-mqueue.mount ● │ │ ├─keyboard-setup.service ● │ │ ├─kmod-static-nodes.service ● │ │ ├─lvm2-lvmpolld.socket ● │ │ ├─lvm2-monitor.service ● │ │ ├─proc-sys-fs-binfmt_misc.automount ● │ │ ├─sys-fs-fuse-connections.mount ● │ │ ├─sys-kernel-config.mount ● │ │ ├─sys-kernel-debug.mount ● │ │ ├─sys-kernel-tracing.mount ● │ │ ├─systemd-ask-password-console.path ● │ │ ├─systemd-binfmt.service ● │ │ ├─systemd-boot-system-token.service ● │ │ ├─systemd-hwdb-update.service ● │ │ ├─systemd-journal-flush.service ● │ │ ├─systemd-journald.service ● │ │ ├─systemd-machine-id-commit.service ● │ │ ├─systemd-modules-load.service ● │ │ ├─systemd-pstore.service ● │ │ ├─systemd-random-seed.service ● │ │ ├─systemd-sysctl.service ● │ │ ├─systemd-sysusers.service ● │ │ ├─systemd-timesyncd.service ● │ │ ├─systemd-tmpfiles-setup-dev.service ● │ │ ├─systemd-tmpfiles-setup.service ● │ │ ├─systemd-udev-trigger.service ● │ │ ├─systemd-udevd.service ● │ │ ├─systemd-update-utmp.service ● │ │ ├─cryptsetup.target ● │ │ ├─local-fs.target <=== ● │ │ │ ├─-.mount <=== ● │ │ │ ├─boot-efi.mount ● │ │ │ ├─boot.mount ● │ │ │ ├─home.automount <=== ● │ │ │ ├─mnt-COMPRESSED\x2dDEDUPLICATED\x2dVDO\x2dLV\x2d1.mount <=== ● │ │ │ ├─mnt-ZEROED\x2dVDO\x2dLV\x2d1.mount <=== ● │ │ │ ├─systemd-fsck-root.service ● │ │ │ ├─systemd-remount-fs.service ● │ │ │ ├─tmp.automount <=== ● │ │ │ └─var.automount <=== ● │ │ └─swap.target ● │ │ └─dev-mapper-Ultron\x2d\x2dvg\x2dswap_1.swap [...] ``` I suppose that the 3 `.automount` entries correspond to the 3 `fstab` bind mounts. But why do I have `tmp.mount` and `var.mount` entries ? And no `home.mount` entry ? Seems strange to me. Maybe it's left from initial installation ? (before creation/addition of VDO volumes)
Note that for tmp/var/root there's both a mount unit and a automount unit -- systemd magically activates the mount unit the first time accesses the path to an automount (and can unmount if inactive for a while). (Or maybe somehow it's a tmpfs for the initrd? Not sure about how exactly systemd does that).
And for /home, you might be seeing that auto-unmounting, since there's an automount unit and not a mount unit... you could look at 'systemd status home.mount' and see how it looks.
Hope this helps a little -- this is definitely bumping at the edges of my systemd knowledge, sorry :)
I believe there's an lvm.conf setting, global/event_activation, which if set to 1 will use event-based startup and might do so in parallel. I'm not sure how this works with VDO, having not worked on VDO for a year, but it's possibly something to try. Also, I don't use Debian and don't know for sure if their version of lvm supports event_activation.
event_activation = 1
doesn't work for me as it doesn't activate VDO volumes so I set it to 0
. Maybe it's not the right way doing it but I opened this issue about this.
cat /etc/lvm/lvm.conf | grep event_activation
# Configuration option global/event_activation.
# When event_activation is disabled, the system will generally run
# event_activation = 1
event_activation = 0
Different recovery time -- could have been caused any number of ways, perhaps there was more disk traffic or memory pressure this time, or maybe this one had to do more recovery, or...
Very strange because now, after upgrade to Debian 11.7, it's fixed without any change... :( sorry if someone get the same problem that he doesn't find any clear solution there.
LVM startup order is independent of fstab order.
Okay thanks
Note that for tmp/var/root there's both a mount unit and a automount unit -- systemd magically activates the mount unit the first time accesses the path to an automount (and can unmount if inactive for a while). (Or maybe somehow it's a tmpfs for the initrd? Not sure about how exactly systemd does that). And for /home, you might be seeing that auto-unmounting, since there's an automount unit and not a mount unit... you could look at 'systemd status home.mount' and see how it looks. Hope this helps a little -- this is definitely bumping at the edges of my systemd knowledge, sorry :)
Thanks for your help! I think it's not a VDO problem then.
Hello @sweettea, I reopen this issue because it has "recurred" (I think it never disappeared). So I applied your suggestion about increasing LVM timeout although I am not a big fan of it. I think the problem is elsewhere and that the LVM VDO volumes should mount in parallel (if possible). Also, a mount time that reaches the LVM timeout on professional hardware indicates, I think, an underlying problem with VDO/KVDO.
On my Debian setup, there was no /etc/systemd/system/lvm2-activation.service.d/10-timeout.conf
file so I did some searches and used this SUSE topic and added x-systemd.device-timeout=600
to each of my VDO mounts in /etc/fstab
and it seems to be a working fix (but not a "good" fix IMHO).
Hello, I run a Debian 11(.6) bare-metal with 2 LVM VDO volumes. Everything was working fine but now my operating system fails booting because it reaches a timeout waiting for one of the VDO volumes (not the biggest by the way). So it goes in emergency mode, I type the root password then run
mount -a && exit
and it mounts the volume like a charm then starts without any problem. I didn't do any big change on this side so I don't know what could be wrong neither how to investigate.Here are my boot logs from
journalctl
, I just cutted start and end, nothing in the middle. I just separated the error to make it clearer :And here is my
fstab file
: