LINBIT / drbd

LINBIT DRBD kernel module
https://docs.linbit.com/docs/users-guide-9.0/
GNU General Public License v2.0
587 stars 100 forks source link

PRIMARY timeouted after secondary node rebooted and joined cluster(minutes after) - RDMA #89

Open Lathanderjk opened 6 months ago

Lathanderjk commented 6 months ago

We are updating to 9.2.9, one of nodes was updated few days back, today after updating second one and rebooting, PRIMARY(which was still 9.2.8) timeouted for more than 60s(FS monitor timeout), it happened after secondary which was rebooted was joined...

PRIMARY system load during timeout decreased and IO busy/throughput on underlaing drives goes down to zero.

I marked time when timeout start in log. log.txt filesystem.res.txt