LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

Unable to perform multiple storage actions at the same time #55

Closed SnelsSM closed 1 year ago

SnelsSM commented 1 year ago

Hi! When I try to perform more than one concurrent action (moving a disk, cloning a VM...), only one action will be successful. Other actions fails with "error: got lock request timeout".

For example, clone a template twice at the same time. First VM image Second VM image

If the second virtual machine is cloned after the "Attempting to create disk resource" state of the first virtual machine completes, the second virtual machine will be cloned successfully.

How can I fix it?

PVE Components

proxmox-ve: 7.4-1 (running kernel: 6.2.6-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.3-8
pve-kernel-5.15: 7.3-3
pve-kernel-6.2.6-1-pve: 6.2.6-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Linstor Components

drbd-dkms 9.2.4-1
drbd-reactor 1.2.0-1
linstor-common 1.23.0-1
linstor-controller 1.23.0-1
linstor-satellite 1.23.0-1
linstor-proxmox 7.0.1-1
rck commented 1 year ago

Plugins don't do any locking, that is handled outside of the plugins by the "pve core", and the error message shown (trying to acquire cfs lock) comes from outside the plugin. As the first action did not finish yet, the second timed out (while getting some lock) before it even started. The key is: it all happens outside of the plugin and I would not be aware how we as plugin could tell "the core" that we can handle concurrency and/or want to do our own locking. I'd assume that concurrency is not really part of how they do things, they serialize actions (per plugin type?, per storage.cfg definition?). And if one thing takes "too long" (for what ever definition of "too long" they have), then that is what you get.

I really don't see any chance to improve things on our side, so cosing this, sorry.

SnelsSM commented 1 year ago

Yes, I understand that locks from PVE. But this doesn't happen with any other storages (lvm/lvm-thin, ceph rbd, nfs, cifs... I can't check others)

This behavior makes Linstor unsuitable for PVE clusters with a lot of concurrent activity (batch VM creation for example).