Tencent / Tendis

Tendis is a high-performance distributed storage system fully compatible with the Redis protocol.
http://tendis.cn
Other
2.92k stars 319 forks source link

主从断开,msg binlogId can't be smaller than highestBinlogId #196

Closed xh-gif closed 2 years ago

xh-gif commented 2 years ago

从节点日志报错: E0721 06:44:46.993256 886 repl_util.cpp:359] binlogId:1441198150 can't be smaller than highestBinlogId:1441198150 storeid:4 E0721 06:44:46.993324 886 repl.cpp:494] applyRepllog failed,mode:0 err:-ERR:4,msg:binlogId:1441198150 can't be smaller than highestBinlogId:1441198150 E0721 06:44:46.993458 886 server_entry.cpp:372] sessid:1026642 cmd:applybinlogsv2 4 [76271] 256 0, error:"-ERR:4,msg:binlogId:1441198150 can't be smaller than highestBinlogId:1441198150\r\n"

线上打算做3主3从,从节点全量同步完后开始增量同步,增量同步时日志一直报错 增量过程中info replication中看到slave_repl_offset,master_repl_offset都在增加,并且从节点的每个rocksdb的lag都在慢慢增大,没有下降,过了几十分钟后主从就断开,info信息中master_link_status变成down

最后info replication信息如下

Replication

role:slave master_host:马赛克 master_port:51004 master_link_status:down master_last_io_seconds_ago:41577 master_last_binlog_seconds_ago:45261 master_sync_in_progress:0 slave_repl_offset:14120163305 master_link_down_since_seconds:41577 slave_priority:100 slave_read_only:1 connected_slaves:0 master_repl_offset:14120169558 rocksdb0_master:ip=masaike,port=51004,src_store_id=0,state=error,fullsync_succ_times=0,binlog_pos=1427344199,lag=43947,error=store:0 incrsync master bad return:-ERR invalid binlogPos,storeId:0,master firstPos:1442486355,slave binlogPos:1427344199,lastFlushBinlogId:0 rocksdb1_master:ip=masaike,port=51004,src_store_id=1,state=error,fullsync_succ_times=0,binlog_pos=1412803166,lag=44374,error=store:1 incrsync master bad return:-ERR invalid binlogPos,storeId:1,master firstPos:1427844000,slave binlogPos:1412803166,lastFlushBinlogId:0 rocksdb2_master:ip=masaike,port=51004,src_store_id=2,state=error,fullsync_succ_times=0,binlog_pos=1418404188,lag=43887,error=store:2 incrsync master bad return:-ERR invalid binlogPos,storeId:2,master firstPos:1433284589,slave binlogPos:1418404188,lastFlushBinlogId:0 rocksdb3_master:ip=masaike,port=51004,src_store_id=3,state=error,fullsync_succ_times=0,binlog_pos=1402797744,lag=45097,error=store:3 incrsync master bad return:-ERR invalid binlogPos,storeId:3,master firstPos:1418028863,slave binlogPos:1402797744,lastFlushBinlogId:0 rocksdb4_master:ip=masaike,port=51004,src_store_id=4,state=error,fullsync_succ_times=0,binlog_pos=1402123980,lag=44381,error=store:4 incrsync master bad return:-ERR invalid binlogPos,storeId:4,master firstPos:1417179354,slave binlogPos:1402123980,lastFlushBinlogId:0 rocksdb5_master:ip=masaike,port=51004,src_store_id=5,state=error,fullsync_succ_times=0,binlog_pos=1398530485,lag=45046,error=store:5 incrsync master bad return:-ERR invalid binlogPos,storeId:5,master firstPos:1413683573,slave binlogPos:1398530485,lastFlushBinlogId:0 rocksdb6_master:ip=masaike,port=51004,src_store_id=6,state=error,fullsync_succ_times=0,binlog_pos=1412803229,lag=45195,error=store:6 incrsync master bad return:-ERR invalid binlogPos,storeId:6,master firstPos:1427890699,slave binlogPos:1412803229,lastFlushBinlogId:0 rocksdb7_master:ip=masaike,port=51004,src_store_id=7,state=error,fullsync_succ_times=0,binlog_pos=1413125088,lag=45244,error=store:7 incrsync master bad return:-ERR invalid binlogPos,storeId:7,master firstPos:1428395894,slave binlogPos:1413125088,lastFlushBinlogId:0 rocksdb8_master:ip=masaike,port=51004,src_store_id=8,state=error,fullsync_succ_times=0,binlog_pos=1412860879,lag=45256,error=store:8 incrsync master bad return:-ERR invalid binlogPos,storeId:8,master firstPos:1427872173,slave binlogPos:1412860879,lastFlushBinlogId:0 rocksdb9_master:ip=masaike,port=51004,src_store_id=9,state=error,fullsync_succ_times=0,binlog_pos=1419370347,lag=45261,error=store:9 incrsync master bad return:-ERR invalid binlogPos,storeId:9,master firstPos:1434562902,slave binlogPos:1419370347,lastFlushBinlogId:0

tendis版本原来是2.2.2,之后尝试升级到2.4.3,但是问题依旧 请问大佬怎么解决?

xh-gif commented 2 years ago

不是压测,是线上业务,qps量比较大,key数量也是亿级的 这个报错是因为slave跟不上,所以才主从断开的吗

xh-gif commented 2 years ago

这个我们尝试过了,重建了好多次了。。从节点数据目录都清空,binlog也删掉了 数据每次全量结束完后开始增量数据日志就报错

raffertyyu commented 2 years ago

当前主从节点都是2.4.3吗。

xh-gif commented 2 years ago

当前主从节点版本都是2.4.3

takenliu commented 2 years ago

好,我们查一下,晚点回复你

zyqlzr commented 2 years ago

@xh-gif 有没有微信,加一下,我们一起来看下这个问题

xh-gif commented 2 years ago

我发到你的gmail邮箱了@zyqlzr

raffertyyu commented 2 years ago

确认是2.5.0已修复的bug,已通过替换镜像修复。