If a DRBD device is Primary on one host and Secondary on the other, and is resized, and afterward the host that is Secondary is hard-reset, then the DRBD device will no longer come up on the Secondary host after reboot until create-md is run again.
This seems like an issue with metadata relocation during an online resize. The problem does not occur if both nodes are Secondary when resized. I originally ran into this in a Proxmox/Linstor cluster and have distilled down a much simpler reproduction scenario. That issue has a lot of info and is here: https://github.com/LINBIT/linstor-server/issues/423
I have reproduced this bug on both KVM Q35 VMs with Virtio disks and real hardware with an Intel 200 series chipset AHCI controller in AHCI mode with directly attached SATA SSDs (tested both Crucial MX500 and Intel S4620 drives)
In my Proxmox setup this occurs with both ZFS and LVMTHIN backing devices, so it should not be related specifically to LVM thinpools as is used for the reproduction below.
Error Reproduction
Test setup
Two Debian Bookworm KVM Q35 VMs: drbdt1 and drbdt3
Each has a 60GB virtio disk at /dev/vdb, which is an LVM PV, which is the only disk in VG dvg, which contains thinpool dvgtp
Each has an LV vol2
LV dvg/vol2 on each is the backing device for DRBD device minor 2 (vol2)
root@drbdt3:~# echo b > /proc/sysrq-trigger
Connection to drbdt3.local closed.
thogan@cinder:~$ ssh drbdt3.local
Linux drbdt3 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Oct 19 15:13:13 2024 from 192.168.97.96
root@drbdt3:~# uptime
15:32:19 up 0 min, 1 user, load average: 0.00, 0.00, 0.00
Attempt to bring up the resource on drbdt3, resource comes up Diskless
root@drbdt3:~# drbdadm status
# No currently configured DRBD found.
root@drbdt3:~# drbdadm up vol2
No usable activity log found. Do you need to create-md?
root@drbdt3:~# drbdadm -v status vol2
drbdsetup status vol2 --verbose
vol2 node-id:1 role:Secondary suspended:no force-io-failures:no
volume:0 minor:2 disk:Diskless client:no backing_dev:none quorum:yes blocked:no
drbdt1 node-id:0 connection:StandAlone role:Unknown tls:no congested:no ap-in-flight:0 rs-in-flight:0
volume:0 replication:Off peer-disk:DUnknown resync-suspended:no
root@drbdt1:~# drbdadm status vol2
vol2 role:Primary
disk:UpToDate
drbdt3 connection:Connecting
Restoring Secondary
This situation can be recovered manually by running create-md on the broken Secondary:
root@drbdt3:~# drbdadm create-md vol2
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/dvg/vol2 at byte offset 4294963200
Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yes
md_offset 4294963200
al_offset 4294930432
bm_offset 4294799360
Found ext3 filesystem
1048508 kB data area apparently used
4194140 kB left usable by current configuration
Even though it looks like this would place the new meta data into
unused space, you still need to confirm, as this is only a guess.
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (128 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
disk:Diskless
drbdt1 connection:StandAlone
root@drbdt3:~# drbdadm adjust vol2
root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
disk:Inconsistent
drbdt1 role:Primary
replication:SyncTarget peer-disk:UpToDate done:1.44
root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
disk:UpToDate
drbdt1 role:Primary
peer-disk:UpToDate
Resource Config Files
The same resource config file is used on each node:
root@drbdt1:~# cat /etc/drbd.d/vol2.res
resource "vol2" {
device minor 2;
disk /dev/dvg/vol2;
meta-disk internal;
net {
protocol C;
}
on "drbdt1" {
node-id 0;
address ipv6 [fdbc:6a5c:a49a:6:5054:ff:fef6:f40e]:7002;
}
on "drbdt3" {
node-id 1;
address ipv6 [fdbc:6a5c:a49a:6:5054:ff:feab:e5dd]:7002;
}
}
root@drbdt3:~# cat /etc/drbd.d/vol2.res
resource "vol2" {
device minor 2;
disk /dev/dvg/vol2;
meta-disk internal;
net {
protocol C;
}
on "drbdt1" {
node-id 0;
address ipv6 [fdbc:6a5c:a49a:6:5054:ff:fef6:f40e]:7002;
}
on "drbdt3" {
node-id 1;
address ipv6 [fdbc:6a5c:a49a:6:5054:ff:feab:e5dd]:7002;
}
}
Debugging Notes
I reproduced this situation repeatedly under different conditions and discovered the following:
The error does not occur if drbdadm down vol2 is run on the Secondary before the hard-reset, or if the host is rebooted with a clean shutdown.
The error does not occur if the Primary is hard-reset.
The error does not occur if no node is Primary
The error occurs when there are three nodes, with either 1 diskless or all 3 diskful
If the Secondary is downed before the hard reset, the next attempt to up the resource outputs an error? warning? but succeeds at bringing the resource up:
root@drbdt3:~# drbdadm up vol2
No usable activity log found. Do you need to create-md?
Error ignored, no need to apply the AL
If a DRBD device is Primary on one host and Secondary on the other, and is resized, and afterward the host that is Secondary is hard-reset, then the DRBD device will no longer come up on the Secondary host after reboot until
create-md
is run again.This seems like an issue with metadata relocation during an online resize. The problem does not occur if both nodes are Secondary when resized. I originally ran into this in a Proxmox/Linstor cluster and have distilled down a much simpler reproduction scenario. That issue has a lot of info and is here: https://github.com/LINBIT/linstor-server/issues/423
I have reproduced this bug on both KVM Q35 VMs with Virtio disks and real hardware with an Intel 200 series chipset AHCI controller in AHCI mode with directly attached SATA SSDs (tested both Crucial MX500 and Intel S4620 drives)
In my Proxmox setup this occurs with both ZFS and LVMTHIN backing devices, so it should not be related specifically to LVM thinpools as is used for the reproduction below.
Error Reproduction
Test setup
drbdt1
anddrbdt3
/dev/vdb
, which is an LVM PV, which is the only disk in VGdvg
, which contains thinpooldvgtp
vol2
vol2
)Versions
Test Process
vol2
is Primary ondrbdt1
, Secondary ondrbdt3
.drbdadm resize vol2
ondrbdt1
, then display status on both nodesdrbdt3
and wait for it to come backdrbdt3
, resource comes up DisklessRestoring Secondary
This situation can be recovered manually by running
create-md
on the broken Secondary:Resource Config Files
The same resource config file is used on each node:
Debugging Notes
I reproduced this situation repeatedly under different conditions and discovered the following:
drbdadm down vol2
is run on the Secondary before the hard-reset, or if the host is rebooted with a clean shutdown.dmesg output on Secondary during resize