alibaba / canal

阿里巴巴 MySQL binlog 增量订阅&消费组件
Apache License 2.0
28.52k stars 7.62k forks source link

为什么canal运行一段时间后,mq就接收不到数据库的变动消息了 #5236

Open liumingdfx opened 3 months ago

liumingdfx commented 3 months ago

日志没有报错

canal.log

2024-08-07 00:30:05.144 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
2024-08-07 00:30:05.211 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
2024-08-07 00:30:05.432 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2024-08-07 00:30:05.501 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[xxx:11111]
2024-08-07 00:30:07.333 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
2024-08-07 00:30:07.709 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify start example successful.
2024-08-07 09:31:29.648 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
2024-08-07 09:31:29.690 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
2024-08-07 09:31:29.943 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2024-08-07 09:31:30.038 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[xxx:11111]
2024-08-07 09:31:31.859 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
2024-08-07 09:31:32.370 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify start example successful.

这个监听的数据库的日志

2024-08-07 09:31:31.926 [destination = med , address = xxxx:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just last position
 {"identity":{"slaveId":-1,"sourceAddress":{"address":"xxx","port":3306}},"postion":{"gtid":"","included":false,"journalName":"mysql-bin.000007","position":213964910,"serverId":1722255921,"timestamp":1722965790000}}
2024-08-07 09:31:32.738 [destination = med , address = xxx:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.000007,position=213964910,serverId=1722255921,gtid=,timestamp=1722965790000] cost : 821ms , the next step is binlog dump

meta.log

2024-08-07 09:49:04.636 - clientId:1001 cursor:[mysql-bin.000007,351347428,1722995343000,1722255921,] address[xxx3306]
2024-08-07 09:49:05.636 - clientId:1001 cursor:[mysql-bin.000007,351362418,1722995345000,1722255921,] address[xxx:3306]
2024-08-07 09:49:09.636 - clientId:1001 cursor:[mysql-bin.000007,351446569,1722995349000,1722255921,] address[xxx:3306]
2024-08-07 09:49:11.636 - clientId:1001 cursor:[mysql-bin.000007,351472233,1722995351000,1722255921,] address[xxx:3306]

这个是自带的example.log

2024-08-07 09:51:38.081 [destination = example , address = /127.0.0.1:3306 , EventParser] ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:example[com.alibaba.otter.canal.parse.exception.CanalParseException: java.io.IOException: connect /127.0.0.1:3306 failure
Caused by: java.io.IOException: connect /127.0.0.1:3306 failure
        at com.alibaba.otter.canal.parse.driver.mysql.MysqlConnector.connect(MysqlConnector.java:85)
        at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.connect(MysqlConnection.java:89)
        at com.alibaba.otter.canal.parse.inbound.mysql.MysqlEventParser.preDump(MysqlEventParser.java:87)
        at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$1.run(AbstractEventParser.java:176)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at com.alibaba.otter.canal.parse.driver.mysql.socket.BioSocketChannelPool.open(BioSocketChannelPool.java:18)
        at com.alibaba.otter.canal.parse.driver.mysql.socket.SocketChannelPool.open(SocketChannelPool.java:18)
        at com.alibaba.otter.canal.parse.driver.mysql.MysqlConnector.connect(MysqlConnector.java:80)
        ... 4 more

大概触发条件可能是,当部署canal-server的机器cup满了之后(用docker部署的,机器上有其他的服务),mq就收不到消息投递了,然后重启一下就又好了

zhangsanhelisi commented 1 month ago

大概和你遇到了同样的问题,一切换集群就有问题,切换集群的时候 zk上保留的地址信息被修改了,你检查下 zk上的address 还是你之前配置的信息吗?如果配置的是ip 源码里是先找host 然后会把host信息同步到zk上,检查检查 数据库那台物理机的 hosts文件吧。看看是不是配了 别名 之类的。