DRBD not ready on Proxmox reboot

ggzengel commented 4 years ago

While booting Proxmox the ZFS command needs to long and VMs with drbd devices are not started.

Perhaps there should be second service which waits for populating the devies and proxmox should depend on this.

I get this on 2 hosts in a cluster:

ERROR REPORT 5F26E34B-7E463-000000

============================================================

Application:                        LINBIT® LINSTOR
Module:                             Satellite
Version:                            1.7.3
Build ID:                           246c81885be6cd343667aff3c54e026f52ad0258
Build time:                         2020-07-22T13:22:31+00:00
Error time:                         2020-08-02 16:02:06
Node:                               px3.cc.private

============================================================

Reported error:
===============

Description:
    Failed to query 'zfs' info
Cause:
    External command timed out
Additional information:
    External command: zfs list -H -p -o name,used,volsize,type -t volume,snapshot

Category:                           LinStorException
Class name:                         StorageException
Class canonical name:               com.linbit.linstor.storage.StorageException
Generated at:                       Method 'genericExecutor', Source file 'Commands.java', Line #121

Error message:                      Failed to query 'zfs' info

Call backtrace:

    Method                                   Native Class:Line number
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:121
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:64
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:52
    list                                     N      com.linbit.linstor.storage.utils.ZfsCommands:20
    getZfsList                               N      com.linbit.linstor.storage.utils.ZfsUtils:127
    getInfoListImpl                          N      com.linbit.linstor.storage.layer.provider.zfs.ZfsProvider:131
    updateVolumeAndSnapshotStates            N      com.linbit.linstor.storage.layer.provider.AbsStorageProvider:166
    prepare                                  N      com.linbit.linstor.storage.layer.provider.AbsStorageProvider:158
    prepare                                  N      com.linbit.linstor.storage.layer.provider.StorageLayer:161
    prepare                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:628
    prepareLayers                            N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:259
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:127
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
    run                                      N      java.lang.Thread:834

Caused by:
==========

Category:                           Exception
Class name:                         ChildProcessTimeoutException
Class canonical name:               com.linbit.ChildProcessTimeoutException
Generated at:                       Method 'waitFor', Source file 'ChildProcessHandler.java', Line #133

Call backtrace:

    Method                                   Native Class:Line number
    waitFor                                  N      com.linbit.extproc.ChildProcessHandler:133
    syncProcess                              N      com.linbit.extproc.ExtCmd:92
    exec                                     N      com.linbit.extproc.ExtCmd:56
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:80
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:64
    genericExecutor                          N      com.linbit.linstor.storage.layer.provider.utils.Commands:52
    list                                     N      com.linbit.linstor.storage.utils.ZfsCommands:20
    getZfsList                               N      com.linbit.linstor.storage.utils.ZfsUtils:127
    getInfoListImpl                          N      com.linbit.linstor.storage.layer.provider.zfs.ZfsProvider:131
    updateVolumeAndSnapshotStates            N      com.linbit.linstor.storage.layer.provider.AbsStorageProvider:166
    prepare                                  N      com.linbit.linstor.storage.layer.provider.AbsStorageProvider:158
    prepare                                  N      com.linbit.linstor.storage.layer.provider.StorageLayer:161
    prepare                                  N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:628
    prepareLayers                            N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:259
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceHandlerImpl:127
    dispatchResources                        N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:258
    phaseDispatchDeviceHandlers              N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:896
    devMgrLoop                               N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:618
    run                                      N      com.linbit.linstor.core.devmgr.DeviceManagerImpl:535
    run                                      N      java.lang.Thread:834

END OF ERROR REPORT.

rck commented 4 years ago

ZFS is completely optional in LINSTOR, so if you use it, I suggest that you add a dependency on the zfs services for LINSTOR. That would be the satellite service (i.e., systemctl edit --system --full linstor-satellite.service, depending on your actual preferences). This is something the admin has to define. We should probably document that better, patches to linbit-documentation are welcome.

ggzengel commented 4 years ago

Even linstor-satellite.service gets ready and Proxmox starts, the devices are not populated. Is there a command which will return true if all devices are ready?

But the better approach is that proxmox would wait starting VMs if your interface to proxmox signal that the devices are not ready.

rck commented 4 years ago

Did you add the service dependency so that LINSTOR depends on the zfs services? If not, then no surprises at all. obviously it gets "ready" because it is started in in parallel to the zfs stuff, but then queries zfs information before zfs is actually finished and ready. This has nothing to do with Proxmox and signaling whatsoever. You need to define the correct startup order between the zfs services and LINSTOR.

ggzengel commented 4 years ago

First I used After=zfs-import.target. Then After=zfs.target. But nothing worked.

The problem is that I have more than 1800 snapshots and zfs list need more than 2 min at boot time. Isn't it possible to wait in LINSTORPlugin.pm some time for the devices to get ready or return something like 503 service unavailable and Proxmox has to retry it a few times?

I created a service which will warm the zfs cache and delayed linstore-satellite.service and pve-guests.service. But PVE start the VMs before linstor-satellite will the devices get ready which needs round about 30 more seconds.

blockdev: cannot open /dev/drbd/by-res/vm-152-disk-1/0: No such file or directory

Here is something needed like zfs-volume-wait.service.

The final part was to start a non DRBD VM with delay=180 while the devices get ready. But this is not smart.

# systemctl cat zfs-warm-cache.service 
# /etc/systemd/system/zfs-warm-cache.service
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null

[Install]
WantedBy=zfs.target

Here is a svg file from systemd-analyze plot. zfs systemd-plot.svg.zip

ggzengel commented 4 years ago

@rck Have seen my last comment?

rck commented 4 years ago

I think these are 2 problems:

zfs (list or whatever) not ready when linstor starts: this has to be solved via inter-service dependencies. I'd assume linstor tries again and again anyways, right @ghernadi ? So it would come up, but too late.
the plugin reporting "ready" to proxmox while the devices are not started/drbd up/usable yet: That is something where the plugin should improve. Quite frankly I will have to look up the API/the calls we as plugin get. And if we can wait there in a loop until the actual block device is ready. I'm out of office again, so I can look at this earliest next week. But this part is something the plugin has to improve, I agree.

ggzengel commented 4 years ago

So it would come up, but too late.

Yes it comes up for sure.

Perhaps here is some help needed from Proxmox (@Fabian-Gruenbichler) to extend there API?

ggzengel commented 4 years ago

For Proxmox I prefer the API solution.

But for general what's about a little health check tool?

It can check/wait for controller connections
Wait for populating devices
Wait for resyncing

If you read the defaults (timeouts, pools, resources) from the environment it's possible to create a service with a default file from /etc/default/linstor-wait.

[Service]
EnvironmentFile=/etc/default/linstor-wait
ExecStart=linstor-wait

# cat /etc/default/linstor-wait
linstor_controller_timeout = 60s
linstor_device_timeout = 5m
linstor_resync_timeout = 10m

ghernadi commented 4 years ago

I'd assume linstor tries again and again anyways, right @ghernadi ?

I was not sure about this as Linstor's fullSync has some special cases, so I did a quick test with the unfortunate result that it might depend on how zfs fails.
When I tested with

mv /sbin/zfs /sbin/zfs_broken
let the startup / fullSync fail
revert the mv and
wait for a retry

the controller did not retry. That was due to a bug that this failure was not forwarded to the controller as failure but as a simple message. This bug is fixed now, and will be included in the next release (so thanks for making me recheck it :) ).

rck commented 4 years ago

Hi @ggzengel ,

today I looked into it a bit closer, and we came up with the following: We don't want to fix this in a Proxmox (or plugin specific way), boot up should be handled by the service file. The new semantic will be that linstor-satellite.service will only flag "ready" if all block devices on that node are usable. Given that, the rest can then be a simple/usual systemd dependency.

This will take some time, reopening this issue.

In a second step information about readiness of given resources on given nodes can also be exposed via the REST API. This then basically also replaces your standalone tool. The API can then be used in all plugins (e.g., to check if a freshly created resource is actually ready to use (it is a bit more complicated than to just stat the device node)).

Fabian-Gruenbichler commented 3 years ago

chiming in a bit late, since this got lost in my generic github notifications queue. PVE uses pve-storage.target as a boot up synchronization point, so if you hook the linstor services into that and they only complete their startup once everything is accessible, this should be enough to order onboot guest and PVE API startup properly.

theoratkin commented 2 years ago

Sorry for bumping this old thread, but I'm just wondering if there's any progress on this? Or are there any possible workarounds? This is quite a serious issue, since it basically means that not a single VM using DRBD will be started after a reboot.

For now, I wrote a script that checks if Linstor controller is available and a service file that calls this script. Then I specified the service in pve-storage.target, like @Fabian-Gruenbichler suggested. But it's not a great solution, what if some resources become available earlier than others? But here's the script and service file anyway, in case it will help anybody:

#!/bin/bash

tries=100
interval=3

is_ready() {
    linstor r l
}

for (( i=0; i < $tries; ++i )); do
    echo "Trying"
    is_ready && exit 0
    sleep $interval
done

exit 1

[Unit]
Description=Periodically check if Linstor is ready

[Service]
Type=oneshot
ExecStart=/bin/bash /usr/bin/linstor-is-ready.sh

[Install]
WantedBy=multi-user.target

And systemctl edit pve-storage.target:

[Unit]
After=linstor-satellite.service
After=linstor-is-ready.service

Any suggestions on how this could be done better are very welcome!

ggzengel commented 2 years ago

Meanwhile I use:

# systemctl cat zfs-warm-cache.service 
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null

[Install]
WantedBy=zfs.target

ggzengel commented 2 years ago

Meanwhile I use:

# systemctl cat zfs-warm-cache.service 
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null

[Install]
WantedBy=zfs.target

If you will set it as script:

SYSTEMD_EDITOR=tee systemctl edit --full --force zfs-warm-cache.service <<EOF
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null

[Install]
WantedBy=zfs.target
EOF

systemctl enable zfs-warm-cache.service

LINBIT / linstor-server

DRBD not ready on Proxmox reboot #162