Open ggzengel opened 4 years ago
ZFS is completely optional in LINSTOR, so if you use it, I suggest that you add a dependency on the zfs services for LINSTOR. That would be the satellite service (i.e., systemctl edit --system --full linstor-satellite.service
, depending on your actual preferences). This is something the admin has to define. We should probably document that better, patches to linbit-documentation
are welcome.
Even linstor-satellite.service gets ready and Proxmox starts, the devices are not populated. Is there a command which will return true if all devices are ready?
But the better approach is that proxmox would wait starting VMs if your interface to proxmox signal that the devices are not ready.
Did you add the service dependency so that LINSTOR depends on the zfs services? If not, then no surprises at all. obviously it gets "ready" because it is started in in parallel to the zfs stuff, but then queries zfs information before zfs is actually finished and ready. This has nothing to do with Proxmox and signaling whatsoever. You need to define the correct startup order between the zfs services and LINSTOR.
First I used After=zfs-import.target
.
Then After=zfs.target
.
But nothing worked.
The problem is that I have more than 1800 snapshots and zfs list
need more than 2 min at boot time.
Isn't it possible to wait in LINSTORPlugin.pm some time for the devices to get ready or return something like 503 service unavailable
and Proxmox has to retry it a few times?
I created a service which will warm the zfs cache and delayed linstore-satellite.service and pve-guests.service. But PVE start the VMs before linstor-satellite will the devices get ready which needs round about 30 more seconds.
blockdev: cannot open /dev/drbd/by-res/vm-152-disk-1/0: No such file or directory
Here is something needed like zfs-volume-wait.service
.
The final part was to start a non DRBD VM with delay=180
while the devices get ready.
But this is not smart.
# systemctl cat zfs-warm-cache.service
# /etc/systemd/system/zfs-warm-cache.service
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
Here is a svg file from systemd-analyze plot
.
zfs systemd-plot.svg.zip
@rck Have seen my last comment?
I think these are 2 problems:
zfs
(list
or whatever) not ready when linstor starts: this has to be solved via inter-service dependencies. I'd assume linstor tries again and again anyways, right @ghernadi ? So it would come up, but too late.So it would come up, but too late.
Yes it comes up for sure.
Perhaps here is some help needed from Proxmox (@Fabian-Gruenbichler) to extend there API?
For Proxmox I prefer the API solution.
But for general what's about a little health check tool?
If you read the defaults (timeouts, pools, resources) from the environment it's possible to create a service with a default file from /etc/default/linstor-wait.
[Service]
EnvironmentFile=/etc/default/linstor-wait
ExecStart=linstor-wait
# cat /etc/default/linstor-wait
linstor_controller_timeout = 60s
linstor_device_timeout = 5m
linstor_resync_timeout = 10m
I'd assume linstor tries again and again anyways, right @ghernadi ?
I was not sure about this as Linstor's fullSync has some special cases, so I did a quick test with the unfortunate result that it might depend on how zfs fails.
When I tested with
mv /sbin/zfs /sbin/zfs_broken
mv
andthe controller did not retry. That was due to a bug that this failure was not forwarded to the controller as failure but as a simple message. This bug is fixed now, and will be included in the next release (so thanks for making me recheck it :) ).
Hi @ggzengel ,
today I looked into it a bit closer, and we came up with the following: We don't want to fix this in a Proxmox (or plugin specific way), boot up should be handled by the service file. The new semantic will be that linstor-satellite.service
will only flag "ready" if all block devices on that node are usable. Given that, the rest can then be a simple/usual systemd dependency.
This will take some time, reopening this issue.
In a second step information about readiness of given resources on given nodes can also be exposed via the REST API. This then basically also replaces your standalone tool. The API can then be used in all plugins (e.g., to check if a freshly created resource is actually ready to use (it is a bit more complicated than to just stat
the device node)).
chiming in a bit late, since this got lost in my generic github notifications queue. PVE uses pve-storage.target as a boot up synchronization point, so if you hook the linstor services into that and they only complete their startup once everything is accessible, this should be enough to order onboot guest and PVE API startup properly.
Sorry for bumping this old thread, but I'm just wondering if there's any progress on this? Or are there any possible workarounds? This is quite a serious issue, since it basically means that not a single VM using DRBD will be started after a reboot.
For now, I wrote a script that checks if Linstor controller is available and a service file that calls this script. Then I specified the service in pve-storage.target
, like @Fabian-Gruenbichler suggested. But it's not a great solution, what if some resources become available earlier than others? But here's the script and service file anyway, in case it will help anybody:
#!/bin/bash
tries=100
interval=3
is_ready() {
linstor r l
}
for (( i=0; i < $tries; ++i )); do
echo "Trying"
is_ready && exit 0
sleep $interval
done
exit 1
[Unit]
Description=Periodically check if Linstor is ready
[Service]
Type=oneshot
ExecStart=/bin/bash /usr/bin/linstor-is-ready.sh
[Install]
WantedBy=multi-user.target
And systemctl edit pve-storage.target
:
[Unit]
After=linstor-satellite.service
After=linstor-is-ready.service
Any suggestions on how this could be done better are very welcome!
Meanwhile I use:
# systemctl cat zfs-warm-cache.service
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
Meanwhile I use:
# systemctl cat zfs-warm-cache.service [Unit] Description=ZFS warm cache DefaultDependencies=no After=zfs-import.target Before=linstor-satellite.service Before=pve-guests.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/zfs list -t all StandardOutput=null [Install] WantedBy=zfs.target
If you will set it as script:
SYSTEMD_EDITOR=tee systemctl edit --full --force zfs-warm-cache.service <<EOF
[Unit]
Description=ZFS warm cache
DefaultDependencies=no
After=zfs-import.target
Before=linstor-satellite.service
Before=pve-guests.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs list -t all
StandardOutput=null
[Install]
WantedBy=zfs.target
EOF
systemctl enable zfs-warm-cache.service
While booting Proxmox the ZFS command needs to long and VMs with drbd devices are not started.
Perhaps there should be second service which waits for populating the devies and proxmox should depend on this.
I get this on 2 hosts in a cluster: