LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

DRBD Device does not come up if Secondary host is hard-reset after an online resize #101

Open modcritical opened 1 month ago

modcritical commented 1 month ago

If a DRBD device is Primary on one host and Secondary on the other, and is resized, and afterward the host that is Secondary is hard-reset, then the DRBD device will no longer come up on the Secondary host after reboot until create-md is run again.

This seems like an issue with metadata relocation during an online resize. The problem does not occur if both nodes are Secondary when resized. I originally ran into this in a Proxmox/Linstor cluster and have distilled down a much simpler reproduction scenario. That issue has a lot of info and is here: https://github.com/LINBIT/linstor-server/issues/423

I have reproduced this bug on both KVM Q35 VMs with Virtio disks and real hardware with an Intel 200 series chipset AHCI controller in AHCI mode with directly attached SATA SSDs (tested both Crucial MX500 and Intel S4620 drives)

In my Proxmox setup this occurs with both ZFS and LVMTHIN backing devices, so it should not be related specifically to LVM thinpools as is used for the reproduction below.

Error Reproduction

Test setup

root@drbdt1:~# pvs
  PV         VG  Fmt  Attr PSize   PFree
  /dev/vdb   dvg lvm2 a--  <60.00g    0 
root@drbdt1:~# vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  dvg   1   3   0 wz--n- <60.00g    0 
root@drbdt1:~# lvs
  LV    VG  Attr       LSize   Pool  Origin Data%  Meta%  Move Log Cpy%Sync Convert
  dvgtp dvg twi-aot--- <59.88g              0.67   10.66                           
  vol1  dvg Vwi-a-t---   1.00g dvgtp        0.01                                   
  vol2  dvg Vwi-aot---   3.00g dvgtp        13.27

Versions

Linux drbdt1 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux
drbd dkms module 9.2.11, built from git commit d7212a2eaeda23f8cb71be36ba52a5163f4dc694

Test Process

root@drbdt1:~# drbdadm -v status vol2
drbdsetup status vol2 --verbose
vol2 node-id:0 role:Primary suspended:no force-io-failures:no
  volume:0 minor:2 disk:UpToDate backing_dev:/dev/dvg/vol2 quorum:yes blocked:no
  drbdt3 node-id:1 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no

root@drbdt3:~# drbdadm -v status
drbdsetup status --verbose
vol2 node-id:1 role:Secondary suspended:no force-io-failures:no
  volume:0 minor:2 disk:UpToDate backing_dev:/dev/dvg/vol2 quorum:yes blocked:no
  drbdt1 node-id:0 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
root@drbdt1:~# lvresize -L+1G /dev/dvg/vol2
  Size of logical volume dvg/vol2 changed from 3.00 GiB (768 extents) to 4.00 GiB (1024 extents).
  Logical volume dvg/vol2 successfully resized.

root@drbdt3:~# lvresize -L+1G /dev/dvg/vol2
  Size of logical volume dvg/vol2 changed from 3.00 GiB (768 extents) to 4.00 GiB (1024 extents).
  Logical volume dvg/vol2 successfully resized.
root@drbdt1:~# drbdadm resize vol2

root@drbdt1:~# drbdadm -v status vol2
drbdsetup status vol2 --verbose
vol2 node-id:0 role:Primary suspended:no force-io-failures:no
  volume:0 minor:2 disk:UpToDate backing_dev:/dev/dvg/vol2 quorum:yes blocked:no
  drbdt3 node-id:1 connection:Connected role:Secondary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no

root@drbdt3:~# drbdadm -v status vol2
drbdsetup status vol2 --verbose
vol2 node-id:1 role:Secondary suspended:no force-io-failures:no
  volume:0 minor:2 disk:UpToDate backing_dev:/dev/dvg/vol2 quorum:yes blocked:no
  drbdt1 node-id:0 connection:Connected role:Primary tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
root@drbdt3:~# echo b > /proc/sysrq-trigger
Connection to drbdt3.local closed.
thogan@cinder:~$ ssh drbdt3.local
Linux drbdt3 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat Oct 19 15:13:13 2024 from 192.168.97.96
root@drbdt3:~# uptime
 15:32:19 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
root@drbdt3:~# drbdadm status
# No currently configured DRBD found.
root@drbdt3:~# drbdadm up vol2
No usable activity log found. Do you need to create-md?
root@drbdt3:~# drbdadm -v status vol2
drbdsetup status vol2 --verbose
vol2 node-id:1 role:Secondary suspended:no force-io-failures:no
  volume:0 minor:2 disk:Diskless client:no backing_dev:none quorum:yes blocked:no
  drbdt1 node-id:0 connection:StandAlone role:Unknown tls:no congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no

root@drbdt1:~# drbdadm status vol2
vol2 role:Primary
  disk:UpToDate
  drbdt3 connection:Connecting

Restoring Secondary

This situation can be recovered manually by running create-md on the broken Secondary:

root@drbdt3:~# drbdadm create-md vol2
You want me to create a v09 style flexible-size internal meta data block.
There appears to be a v09 flexible-size internal meta data block
already in place on /dev/dvg/vol2 at byte offset 4294963200

Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yes

md_offset 4294963200
al_offset 4294930432
bm_offset 4294799360

Found ext3 filesystem
     1048508 kB data area apparently used
     4194140 kB left usable by current configuration

Even though it looks like this would place the new meta data into
unused space, you still need to confirm, as this is only a guess.

Do you want to proceed?
[need to type 'yes' to confirm] yes

initializing activity log
initializing bitmap (128 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
  disk:Diskless
  drbdt1 connection:StandAlone

root@drbdt3:~# drbdadm adjust vol2
root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
  disk:Inconsistent
  drbdt1 role:Primary
    replication:SyncTarget peer-disk:UpToDate done:1.44

root@drbdt3:~# drbdadm status vol2
vol2 role:Secondary
  disk:UpToDate
  drbdt1 role:Primary
    peer-disk:UpToDate

Resource Config Files

The same resource config file is used on each node:

root@drbdt1:~# cat /etc/drbd.d/vol2.res 
resource "vol2" {
        device minor 2;
        disk /dev/dvg/vol2;
        meta-disk internal;

        net {
                protocol C;
        }

        on "drbdt1" {
                node-id 0;
                address ipv6 [fdbc:6a5c:a49a:6:5054:ff:fef6:f40e]:7002;
        }

        on "drbdt3" {
                node-id 1;
                address ipv6 [fdbc:6a5c:a49a:6:5054:ff:feab:e5dd]:7002;
        }
}

root@drbdt3:~# cat /etc/drbd.d/vol2.res
resource "vol2" {
        device minor 2;
        disk /dev/dvg/vol2;
        meta-disk internal;

        net {
                protocol C;
        }

        on "drbdt1" {
                node-id 0;
                address ipv6 [fdbc:6a5c:a49a:6:5054:ff:fef6:f40e]:7002;
        }

        on "drbdt3" {
                node-id 1;
                address ipv6 [fdbc:6a5c:a49a:6:5054:ff:feab:e5dd]:7002;
        }
}

Debugging Notes

I reproduced this situation repeatedly under different conditions and discovered the following:

root@drbdt3:~# drbdadm up vol2
No usable activity log found. Do you need to create-md?
Error ignored, no need to apply the AL

dmesg output on Secondary during resize

root@drbdt3:~# dmesg -W
[  238.198991] drbd vol2 drbdt1: Preparing remote state change 499518905 (local_max_size = 7339772 KiB)
[  238.199089] drbd vol2 drbdt1: Committing remote state change 499518905 (primary_nodes=1)
[  238.199094] drbd vol2/0 drbd2: drbd_bm_resize called with capacity == 14679544
[  238.199111] drbd vol2/0 drbd2: resync bitmap: bits=1834943 words=28671 pages=56
[  238.199114] drbd2: detected capacity change from 12582456 to 14679544
[  238.199130] drbd vol2/0 drbd2: size = 7168 MB (7339772 KB)
[  238.199133] drbd vol2/0 drbd2: persisting effective size = 7168 MB (7339772 KB)
[  238.214827] drbd vol2/0 drbd2: Writing the whole bitmap, size changed and md moved
[  238.218538] drbd vol2/0 drbd2 drbdt1: helper command: /sbin/drbdadm before-resync-target
[  238.219338] drbd vol2/0 drbd2 drbdt1: helper command: /sbin/drbdadm before-resync-target exit code 0
[  238.219347] drbd vol2/0 drbd2: disk( UpToDate -> Inconsistent ) [resize]
[  238.219349] drbd vol2/0 drbd2 drbdt1: repl( Established -> SyncTarget ) [resize]
[  238.219382] drbd vol2/0 drbd2 drbdt1: Began resync as SyncTarget (will sync 1048544 KB [262136 bits set]).
[  238.222736] drbd vol2/0 drbd2 drbdt1: received new current UUID: 103FC2FF0ABB2057 weak_nodes=FFFFFFFFFFFFFFFC
[  248.913380] drbd vol2/0 drbd2 drbdt1: Resync done (total 10 sec; paused 0 sec; 104852 K/sec)
[  248.913388] drbd vol2/0 drbd2 drbdt1: updated UUIDs 103FC2FF0ABB2056:0000000000000000:01ED7EDA24C3FB24:0000000000000000
[  248.913395] drbd vol2/0 drbd2: disk( Inconsistent -> UpToDate ) [resync-finished]
[  248.913398] drbd vol2/0 drbd2 drbdt1: repl( SyncTarget -> Established ) [resync-finished]
[  248.952854] drbd vol2/0 drbd2 drbdt1: helper command: /sbin/drbdadm after-resync-target
[  248.953750] drbd vol2/0 drbd2 drbdt1: helper command: /sbin/drbdadm after-resync-target exit code 0