Closed modcritical closed 1 month ago
Thanks for the very nice bug report. Any chance you have any RAID controller with caching in your setup or any other type of volatile cache in your backend storage?
This issue was produced with a single backing disk on each node, attached to a SATA controller in AHCI mode - no RAID. I reproduced the issue with both enterprise SSD and consumer SSD backing disks (Intel S4620 and Crucial MX500). The issue also occurs if I run sync ; sync
on each node before power-cycling, which ensures any pages pending a write to disk are written before the power-cycle.
I know the log of stuff I tried is very long, so let me call out this part here:
~The issue does not occur if the volume is resized outside of Linstor -
lvextend
orzfs set volsize
thendrbdadm resize
, with or without --assume-clean~
~It appears there is something different being done when Linstor resizes the DRBD resource vs. the DRBD documented resize process. I think that is the closest I got to the root of the issue.~
EDIT: The above appears incorrect - I tried again and was not able to manually perform a resize that didn't exhibit this issue after a power-cycle of a host where the volume was in Secondary state.
(also, I see now that I had check-resize
in the drbdadm command in my original post. This is incorrect, I was using the resize
command. I validated that in my bash_history, and have edited the original post to be correct)
My first guess was also some caching issues... I tried but failed to reproduce this issue, although I am only trying this with DRBD and LINSTOR running in VMs..
It appears there is something different being done when Linstor resizes the DRBD resource vs. the DRBD documented resize process. I think that is the closest I got to the root of the issue.
You might want to enable trace logging on the satellites, using linstor c set-log-level TRACE --global
. With this LINSTOR will log every external command it executes, including LVM, ZFS and DRBD related commands. Right now, it is quite difficult for us to investigate since we can only guess without the ability to reproduce or test anything.
Hi, how is there a decreasing resize of my ZFS storage with Linstor? Cheers!
@ghernadi - I have repeated the process with trace logging enabled.
Notably, when I attempted to re-create the resize sequence that did not fail earlier, both tests resulted in the problem on the Secondary. I said in my initial report that manually using lvextend
and drbdadm resize
does not cause the problem, but that seems to be incorrect.
For the normal process - resizing with Linstor - Here are all the syslog entries from around the time of the resize: linstor-resize-bug-trace-log-pve1-20241017.txt linstor-resize-bug-trace-log-pve2-20241017.txt linstor-resize-bug-trace-log-pve3-20241017.txt
And here are the same logs with all 3 nodes folded together and sorted by time, in case the order of events between nodes is important: linstor-resize-bug-trace-log-allnodes-20241017.txt
Here are my notes from the various scenarios I tried today:
root@pve3:~# date ; linstor vd set-size pm-1f408657 0 22G ; date
Thu Oct 17 10:27:23 AM CDT 2024
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pm-1f408657' modified.
Details:
Volume definition with number '0' of resource definition 'pm-1f408657' UUID is: 57322b05-295d-44a8-b224-87e4e951a119
SUCCESS:
Description:
(pve2) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve2'
SUCCESS:
(pve1) Volume number 0 of resource 'pm-1f408657' [LVM-Thin] resized
SUCCESS:
Description:
(pve1) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve1'
SUCCESS:
(pve3) Volume number 0 of resource 'pm-1f408657' [LVM-Thin] resized
SUCCESS:
Description:
(pve3) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve3'
SUCCESS:
Description:
(pve2) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Resized resource 'pm-1f408657' on 'pve2'
SUCCESS:
Description:
(pve1) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Resized resource 'pm-1f408657' on 'pve1'
SUCCESS:
Description:
(pve3) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Resized resource 'pm-1f408657' on 'pve3'
SUCCESS:
Description:
(pve2) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
Description:
(pve3) Resource 'pm-1f408657' [DRBD] not adjusted
Cause:
This happened most likely because the layer below did not provide a device to work with.
SUCCESS:
(pve1) Resource 'pm-1f408657' [DRBD] adjusted.
SUCCESS:
Resized DRBD resource 'pm-1f408657' on 'pve1'
SUCCESS:
(pve2) Resource 'pm-1f408657' [DRBD] adjusted.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve2'
SUCCESS:
(pve1) Resource 'pm-1f408657' [DRBD] adjusted.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve1'
SUCCESS:
(pve3) Resource 'pm-1f408657' [DRBD] adjusted.
SUCCESS:
Updated volume 0 of 'pm-1f408657' on 'pve3'
Thu Oct 17 10:27:24 AM CDT 2024
root@pve1:~# drbdadm status pm-1f408657
pm-1f408657 role:Secondary
disk:UpToDate
pve2 role:Secondary
peer-disk:Diskless
pve3 role:Primary
peer-disk:UpToDate
root@pve1:~# sync ; sync
root@pve1:~# echo b > /proc/sysrq-trigger
--- HARD REBOOT pve1 ---
root@pve1:~# drbdadm status pm-1f408657
pm-1f408657 role:Secondary
disk:Diskless quorum:no
pve2 connection:StandAlone
pve3 connection:StandAlone
root@pve1:~# drbdadm create-md pm-1f408657
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/cssd1/pm-1f408657_00000 at byte offset 23630704640
Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yes
md_offset 23630704640
al_offset 23630671872
bm_offset 23629946880
Found some data
==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (708 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
root@pve1:~# drbdadm adjust pm-1f408657
root@pve1:~# drbdadm status pm-1f408657
pm-1f408657 role:Secondary
disk:Inconsistent
pve2 role:Secondary
peer-disk:Diskless resync-suspended:dependency
pve3 role:Primary
replication:SyncTarget peer-disk:UpToDate done:0.04
root@pve1:~# drbdadm status pm-1f408657
pm-1f408657 role:Secondary
disk:Inconsistent
pve2 role:Secondary
peer-disk:Diskless resync-suspended:dependency
pve3 role:Primary
replication:SyncTarget peer-disk:UpToDate done:0.57
^^^^ PROBLEM REPRODUCED ^^^^
root@pve3:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdb|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-0726223e_00000 -f
Size of logical volume cssd1/pm-0726223e_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-0726223e_00000 successfully resized.
root@pve1:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdc|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-0726223e_00000 -f
Size of logical volume cssd1/pm-0726223e_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-0726223e_00000 successfully resized.
root@pve1:~# drbdsetup wait-connect-resource --wait-after-sb=yes --wfc-timeout=10 pm-0726223e
degr-wfc-timeout has to be shorter than wfc-timeout
degr-wfc-timeout implicitly set to wfc-timeout (10s)
outdated-wfc-timeout has to be shorter than degr-wfc-timeout
outdated-wfc-timeout implicitly set to degr-wfc-timeout (10s)
root@pve1:~# drbdadm -vvv -- --assume-clean resize pm-0726223e/0
drbdsetup resize 1018 --assume-clean
drbdsetup check-resize 1018
root@pve1:~# drbdadm -vvv adjust pm-0726223e
root@pve2:~# drbdadm -vvv adjust pm-0726223e
root@pve3:~# drbdadm -vvv adjust pm-0726223e
root@pve1:~# drbdadm -vvv adjust pm-0726223e
root@pve1:~# sync ; sync
root@pve1:~# echo b > /proc/sysrq-trigger
---- HARD REBOOT OF pve1 ---
root@pve1:~# drbdadm status pm-0726223e
pm-0726223e role:Secondary
disk:Diskless quorum:no
pve2 connection:StandAlone
pve3 connection:StandAlone
root@pve1:~# drbdadm create-md pm-0726223e
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/cssd1/pm-0726223e_00000 at byte offset 23630704640
Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yes
md_offset 23630704640
al_offset 23630671872
bm_offset 23629946880
Found some data
==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (708 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
root@pve1:~# drbdadm adjust pm-0726223e
root@pve1:~# drbdadm status pm-0726223e
pm-0726223e role:Secondary
disk:Inconsistent
pve2 role:Secondary
peer-disk:Diskless resync-suspended:dependency
pve3 role:Primary
replication:SyncTarget peer-disk:UpToDate done:2.85
^^^^ PROBLEM REPRODUCED ^^^^
root@pve3:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdb|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-68f10490_00000 -f
Size of logical volume cssd1/pm-68f10490_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-68f10490_00000 successfully resized.
root@pve1:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdc|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-68f10490_00000 -f
Size of logical volume cssd1/pm-68f10490_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-68f10490_00000 successfully resized.
root@pve3:~# drbdsetup wait-connect-resource --wait-after-sb=yes --wfc-timeout=10 pm-68f10490
degr-wfc-timeout has to be shorter than wfc-timeout
degr-wfc-timeout implicitly set to wfc-timeout (10s)
outdated-wfc-timeout has to be shorter than degr-wfc-timeout
outdated-wfc-timeout implicitly set to degr-wfc-timeout (10s)
root@pve3:~# drbdadm -vvv -- --assume-clean resize pm-68f10490/0
drbdsetup resize 1018 --assume-clean
drbdsetup check-resize 1018
root@pve3:~# drbdadm -vvv adjust pm-68f10490/0
root@pve2:~# drbdadm -vvv adjust pm-68f10490/0
root@pve1:~# drbdadm -vvv adjust pm-68f10490/0
root@pve3:~# drbdadm -vvv adjust pm-68f10490/0
root@pve1:~# sync ; sync
root@pve1:~# echo b > /proc/sysrq-trigger
root@pve1:~# uptime
13:50:14 up 2 min, 2 users, load average: 0.80, 0.41, 0.16
root@pve1:~# drbdadm status pm-68f10490
pm-68f10490 role:Secondary
disk:Diskless quorum:no
pve2 connection:StandAlone
pve3 connection:StandAlone
^^^^ PROBLEM REPRODUCED ^^^^
root@pve3:~# drbdadm status pm-b6bf9394
pm-b6bf9394 role:Primary
disk:UpToDate
pve1 role:Secondary
peer-disk:UpToDate
pve2 role:Secondary
peer-disk:Diskless
root@pve3:~# linstor vd list -r pm-b6bf9394
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ VolumeNr ┊ VolumeMinor ┊ Size ┊ Gross ┊ State ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ pm-b6bf9394 ┊ 0 ┊ 1018 ┊ 20.00 GiB ┊ ┊ ok ┊
╰───────────────────────────────────────────────────────────────────╯
root@pve3:~# linstor v list -r pm-b6bf9394
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pve1 ┊ pm-b6bf9394 ┊ cssd1 ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ 20.01 GiB ┊ Unused ┊ UpToDate ┊
┊ pve2 ┊ pm-b6bf9394 ┊ DfltDisklessStorPool ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ ┊ Unused ┊ TieBreaker ┊
┊ pve3 ┊ pm-b6bf9394 ┊ cssd1 ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ 20.01 GiB ┊ InUse ┊ UpToDate ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@pve3:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdb|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-b6bf9394_00000 -f
Size of logical volume cssd1/pm-b6bf9394_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-b6bf9394_00000 successfully resized.
root@pve1:~# lvresize --config 'devices { filter=['"'"'a|/dev/sdc|'"'"','"'"'r|.*|'"'"'] }' --size 23076864k cssd1/pm-b6bf9394_00000 -f
Size of logical volume cssd1/pm-b6bf9394_00000 changed from <20.01 GiB (5122 extents) to <22.01 GiB (5634 extents).
Logical volume cssd1/pm-b6bf9394_00000 successfully resized.
root@pve3:~# drbdadm resize pm-b6bf9394
root@pve3:~# drbdadm status pm-b6bf9394
pm-b6bf9394 role:Primary
disk:UpToDate
pve1 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:12.33
pve2 role:Secondary
peer-disk:Diskless
root@pve3:~# drbdadm status pm-b6bf9394
pm-b6bf9394 role:Primary
disk:UpToDate
pve1 role:Secondary
peer-disk:UpToDate
pve2 role:Secondary
peer-disk:Diskless
root@pve1:~# sync ; sync
root@pve1:~# echo b > /proc/sysrq-trigger
---- HARD REBOOT OF pve1 ----
root@pve1:~# uptime
14:31:23 up 5 min, 2 users, load average: 0.29, 0.44, 0.25
root@pve1:~# drbdadm status pm-68f10490
'pm-68f10490' not defined in your config (for this host).
root@pve1:~# drbdadm status pm-b6bf9394
pm-b6bf9394 role:Secondary
disk:Diskless quorum:no
pve2 connection:StandAlone
pve3 connection:StandAlone
^^^^ PROBLEM REPRODUCED ^^^^
root@pve3:~# linstor vd list -r pm-bbc87056
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ VolumeNr ┊ VolumeMinor ┊ Size ┊ Gross ┊ State ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ pm-bbc87056 ┊ 0 ┊ 1018 ┊ 20.00 GiB ┊ ┊ ok ┊
╰───────────────────────────────────────────────────────────────────╯
root@pve3:~# linstor v list -r pm-bbc87056
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊ DeviceName ┊ Allocated ┊ InUse ┊ State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pve1 ┊ pm-bbc87056 ┊ cssd1 ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ 20.01 GiB ┊ Unused ┊ UpToDate ┊
┊ pve2 ┊ pm-bbc87056 ┊ DfltDisklessStorPool ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ ┊ Unused ┊ TieBreaker ┊
┊ pve3 ┊ pm-bbc87056 ┊ cssd1 ┊ 0 ┊ 1018 ┊ /dev/drbd1018 ┊ 20.01 GiB ┊ InUse ┊ UpToDate ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@pve3:~# lvextend -L22G /dev/cssd1/pm-bbc87056_00000
Size of logical volume cssd1/pm-bbc87056_00000 changed from <20.01 GiB (5122 extents) to 22.00 GiB (5632 extents).
Logical volume cssd1/pm-bbc87056_00000 successfully resized.
root@pve3:~# drbdadm resize pm-bbc87056
root@pve1:~# drbdadm status pm-bbc87056
pm-bbc87056 role:Secondary
disk:Inconsistent
pve2 role:Secondary
peer-disk:Diskless resync-suspended:dependency
pve3 role:Primary
replication:SyncTarget peer-disk:UpToDate done:18.65
root@pve1:~# drbdadm status pm-bbc87056
pm-bbc87056 role:Secondary
disk:UpToDate
pve2 role:Secondary
peer-disk:Diskless
pve3 role:Primary
peer-disk:UpToDate
root@pve1:~# sync ; sync
root@pve1:~# echo b > /proc/sysrq-trigger
---- HARD REBOOT OF pve1 ----
root@pve1:~# drbdadm status pm-bbc87056
pm-bbc87056 role:Secondary
disk:Diskless quorum:no
pve2 connection:StandAlone
pve3 connection:StandAlone
Same problem, so this might be at the DRBD layer?
I was able to re-create this with just DRBD and with VMs - so this is not actually a Linstor bug. This can probably be closed, and I will open a bug report on the drbd repo.
I encountered an issue with a Proxmox cluster running Linstor where, after a power outage and full cluster reboot, VMs that had disks resized failed to start. I narrowed the issue down to a problem with DRBD replicas in Secondary state when one replica is Primary and Linstor is used to resize the volume.
If a Linstor volume is resized while Primary on a host, then a DRBD Secondary for that volume is powered off, the Secondary's DRBD resource will not be able to come online after the host is back up. If the Primary is power cycled, the Primary's DRBD resource will sometimes not come back online after the host is back up. If multiple nodes are power-cycled, all Secondaries and sometimes the Primary will have this issue.
The issue can be identified by running
drbd adjust
on power-cycled replicas when they are back up, which will fail with an error saying there is no activity log. This situation results in an inability to re-establish quorum when multiple hosts power-cycle at once, such as in a power outage.This can be fixed by running
drbdadm create-md pm-nnnnnnnn
on the nodes that have backing volumes but are showing as diskless, then if necessary, forcing one to be Primary to start a sync.This issue occurs no matter how far in the past the disk was resized. The first time I encountered the issue was a power outage over two months after having resized a VM disk and that DRBD resource did not come back online.
The issue does not occur if the host is shutdown gracefully:
shutdown -h now
If the host is power-cycled after a graceful shutdown or reboot, the issue will not occur. The issue does not occur if the volume is resized outside of Linstor -lvextend
orzfs set volsize
thendrbdadm resize
, with or without--assume-clean
. I am not sure what Linstor is doing differently in the resize process.My full notes on troubleshooting this issue are attached. Reproduction attempts 3 through 7 are the most interesting. Attempts 1 and 2 contain a bunch of actions / thoughts that turned out to be unrelated. Linstor Single Replica after Resize then Cluster Restart.md
Debugging Summary
drbdadm create-md
on the affected nodes for the affected volume./var/lib/drbd/drbd-minor-nnnn.lkbd
files after resize, and found that they were not updated with the new size on nodes where the volume was Secondary. The file did contain the correct volume size on the node where the volume was Primary.drbdadm check-resize
on the resized volume on each node. This caused thelkbd
file to get updated but the same metadata problem occurred after power-cycle.drbdadm down
andup
to bring the volume offline and back online. The first run ofdrbdadm up
complained withNo usable activity log found.
, but succeeded anyway. I was not able to provoke the same issue without the reboot.Version Info
Linstor controller/satellite are 1.29.1.
DRBD version info: