Closed 0Jvang closed 3 years ago
必须3个master以上才能实现自动故障切换 因为需要超过1半的票数
我现在搭建了3主3从的集群,在停掉一个主节点后,集群还是不可用:
145e4ce1943f20806d8878faf91c4f47fdba8af8 192.168.0.136:9094@19094 master,fail - 1627702684589 1627702681538 1 disconnected 0-5461
1d2fc1c21225be12c6cb1e54e16240c2aa263b16 192.168.0.2:9094@19094 myself,slave 145e4ce1943f20806d8878faf91c4f47fdba8af8 0 1627702907000 0 connected
对应的从节点报了2个错:
错误1:停掉主节点前后一直都有
E0731 11:38:59.658396 2318 spov.cpp:153] slaveStartFullsync rollback, rm dir:/var/webscanner/center/tendis/data/db/2_bak failed:No such file or directory
E0731 11:38:59.670660 2319 spov.cpp:153] slaveStartFullsync rollback, rm dir:/var/webscanner/center/tendis/data/db/3_bak failed:No such file or directory
E0731 11:38:59.693399 2317 spov.cpp:153] slaveStartFullsync rollback, rm dir:/var/webscanner/center/tendis/data/db/4_bak failed:No such file or directory
错误2:停掉主节点后才出现
E0731 11:39:01.198814 2326 cluster_manager.cpp:2758] vote fail, data age to large:1627702725199 limtTime is:151000
E0731 11:39:01.301354 2326 cluster_manager.cpp:2758] vote fail, data age to large:1627702725302 limtTime is:151000
@vinchen
主节点崩掉情况下从节点的报错
说明最直接的那条报错vote fail, data age to large:1627702725302 limtTime is:151000
意思是data age大于节点互连超时时间导致选举失败,发现data age总是等于当前时间戳,那么肯定会比节点互连超时时间大,查了源码中data age的计算方式:
https://github.com/Tencent/Tendis/blob/532b9a9513e2ef644b1d55585fbf4422dfee84d9/src/tendisplus/cluster/cluster_manager.cpp#L2826 data age总是为当前时间戳的原因可能是跟binlog有关的getLastBinlogTs()返回0
正常情况下从节点的报错
W0804 17:53:54.418036 3624 spov.cpp:186] storeId:4,syncMaster:192.168.0.137:9094:4 failed:-ERR:3,msg:fullSync master not ok
这里的日志说主节点全量同步时不ok,可能是导致了data age总是为当前时间戳的原因,不知道是怎么回事,能提供点帮助吗 @vinchen
问题解决了,忽略了一个warning:fullSync req master failed:-NOAUTH Authentication required.
,没有配置主从同步密码masterauth
参数项
我搭建了一个2主2备的集群,在停掉1个主节点后,集群就不可用了,请问集群如何配置才能实现自动故障切换?