LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

Multiple DRBD processes hang, causing the load to increase and eventually the server cannot execute commands. #100

Open xiahao007 opened 1 month ago

xiahao007 commented 1 month ago

drbd

kenerl log: [四 9月 26 20:15:29 2024] R10: 00007ffe340b788c R11: 0000000000000246 R12: 0000000000000020 [四 9月 26 20:15:29 2024] R13: 0000000000000004 R14: 000055ae4f0042a0 R15: 000055ae4e0827a8 [四 9月 26 20:17:32 2024] INFO: task drbdsetup:3440344 blocked for more than 491 seconds. [四 9月 26 20:17:32 2024] Tainted: G OE --------- --- 5.14.0-162.6.1.el9_1.x86_64 #1 [四 9月 26 20:17:32 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [四 9月 26 20:17:32 2024] task:drbdsetup state:D stack: 0 pid:3440344 ppid: 1 flags:0x00004006 [四 9月 26 20:17:32 2024] Call Trace: [四 9月 26 20:17:32 2024] schedule+0x206/0x580 [四 9月 26 20:17:32 2024] schedule+0x43/0xa0 [四 9月 26 20:17:32 2024] schedule_timeout+0x11d/0x160 [四 9月 26 20:17:32 2024] ? _raw_spin_unlock_irqrestore+0xa/0x30 [四 9月 26 20:17:32 2024] ? __wake_up_common_lock+0x8a/0xc0 [四 9月 26 20:17:32 2024] wait_for_common+0x93/0x1d0 [四 9月 26 20:17:32 2024] ? usleep_range_state+0x90/0x90 [四 9月 26 20:17:32 2024] state_change_unlock+0x4e/0x90 [drbd] [四 9月 26 20:17:32 2024] ? may_be_up_to_date+0xe0/0xe0 [drbd] [四 9月 26 20:17:32 2024] end_state_change+0x62/0xb0 [drbd] [四 9月 26 20:17:32 2024] change_cluster_wide_state+0xb9/0x520 [drbd] [四 9月 26 20:17:32 2024] ? kvm_sched_clock_read+0x14/0x40 [四 9月 26 20:17:32 2024] ? raw_spin_rq_lock_nested+0x19/0x80 [四 9月 26 20:17:32 2024] ? idr_get_next_ul+0xb6/0xf0 [四 9月 26 20:17:32 2024] change_role+0x1da/0x210 [drbd] [四 9月 26 20:17:32 2024] drbd_set_role+0xc4/0x7b0 [drbd] [四 9月 26 20:17:32 2024] ? drbd_find_resource+0x74/0xb0 [drbd] [四 9月 26 20:17:32 2024] drbd_adm_down+0x81/0x330 [drbd] [四 9月 26 20:17:32 2024] ? __nla_validate_parse+0x141/0x190 [四 9月 26 20:17:32 2024] genl_family_rcv_msg_doit+0xea/0x150 [四 9月 26 20:17:32 2024] genl_rcv_msg+0xdc/0x1e0 [四 9月 26 20:17:32 2024] ? drbd_adm_set_role+0x200/0x200 [drbd] [四 9月 26 20:17:32 2024] ? genl_get_cmd+0xe0/0xe0 [四 9月 26 20:17:32 2024] netlink_rcv_skb+0x51/0x100 [四 9月 26 20:17:32 2024] genl_rcv+0x24/0x40 [四 9月 26 20:17:32 2024] netlink_unicast+0x23b/0x350 [四 9月 26 20:17:32 2024] netlink_sendmsg+0x23b/0x480 [四 9月 26 20:17:32 2024] sock_sendmsg+0x62/0x70 [四 9月 26 20:17:32 2024] sock_write_iter+0x97/0x100 [四 9月 26 20:17:32 2024] new_sync_write+0x19d/0x1b0 [四 9月 26 20:17:32 2024] vfs_write+0x1ef/0x280 [四 9月 26 20:17:32 2024] ksys_write+0xab/0xe0 [四 9月 26 20:17:32 2024] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [四 9月 26 20:17:32 2024] do_syscall_64+0x5c/0x90 [四 9月 26 20:17:32 2024] ? exc_page_fault+0x62/0x150 [四 9月 26 20:17:32 2024] entry_SYSCALL_64_after_hwframe+0x63/0xcd

xiahao007 commented 1 month ago

drbd version:9.1.14 drbd_code accoding to the log, the method waitng for anything to notify?

rp- commented 1 month ago

try upgrade drbd first? 9.1.22 is the latest 9.1.x version