Closed ajschorr closed 1 year ago
FYI, I just got the same panic on the other secondary node "ti126".
[Wed Oct 18 12:58:09 2023][1548628.601583] list_add corruption. prev->next should be next (ffff9969389d8968), but was ffff996841b261a0. (prev=ffff996841b261a0).
I'm attaching another console log file.
Regards, Andy ti126.log
Should be fixed with this commit: https://github.com/LINBIT/drbd/commit/f72b60c5ff8af5ee8cd8f1d87257afa86a6e0eb3
For reference: This looks more like a case that is fixed by bc9e239cb4c3e7d898fcdd403555e78ee8d76378. Note the call trace involving w_resync_timer
and the presence of csums-alg
in the config.
Ah, thanks. Good to know. So would disabling csums-alg eliminate this issue? The issue has thankfully not recurred...
So would disabling csums-alg eliminate this issue?
Yes, it should prevent the issue.
I just set up a 3-way mesh test config on CentOS Stream 9 running kernel 5.14.0-325.el9.x86_64 At some point, the secondary host "ti140" crashed with this error: [Wed Oct 18 12:28:40 2023][2399614.284617] list_add corruption. prev->next should be next (ffff8c02bc209168), but was ffff8c01e769b760. (prev=ffff8c01e769b760).
Here's what I did:
On all 3 hosts: lvcreate -n pool0 -L 30GiB vg_sys lvconvert -y --type thin-pool vg_sys/pool0 lvcreate -n drbd_main -V 10GiB --thinpool pool0 vg_sys lvcreate -n drbd_archive -V 10GiB --thinpool pool0 vg_sys drbdadm create-md test drbdadm up test
And on the primary host "ti128": drbdadm new-current-uuid --clear-bitmap test/0 drbdadm new-current-uuid --clear-bitmap test/1 drbdadm primary test
I'm attaching the test.res config file from /etc/drbd.d and a log of console messages captured by conserver.
After reboot, "ti140" resynced and seems to be working OK.
Regards, Andy ti140.log test.res.txt