Closed zhangyulu closed 2 years ago
292645_rocketmq-cpp.log 这是实际的log文件,从17:37左右开始出现停止消费的问题,从17:41开始循环打印上述log
我生产环境也遇到这个问题了,从1.2.3升级到2.2.0之后
是顺序消息还是普通消息消费? @yiitz 方便提供一下消费停止时间附近的日志,看下是不是负载均衡过程中丢失了pullrequest。
292645_rocketmq-cpp.log 这是实际的log文件,从17:37左右开始出现停止消费的问题,从17:41开始循环打印上述log
这个附件太大了,下不动。 方便的话,把有问题附近的日志单独取一下。
是顺序消息还是普通消息消费? @yiitz 方便提供一下消费停止时间附近的日志,看下是不是负载均衡过程中丢失了pullrequest。
是普通消息消费,应用跑docker里的,日志重启无了,这几天暂时没有出现。
292645_rocketmq-cpp.log 这是实际的log文件,从17:37左右开始出现停止消费的问题,从17:41开始循环打印上述log
这个附件太大了,下不动。 方便的话,把有问题附近的日志单独取一下。
另外,我用的是顺序消息
这里可能是个顺序消息的BUG。需要进一步排查下。 参考你的日志,
[2020-Mar-29 17:37:50.339075](error):pullMessageAsync failed of addr:10.10.10.11:10911, mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10][pullMessageAsync:490] [2020-Mar-29 17:37:50.339148](info):Pull MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10] occur exception, restart 1s later.[pullMessageAsync:905] [2020-Mar-29 17:37:50.339171](info):Produce Pull request [MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10]] Later and Sleep [1000]ms.[producePullMsgTaskLater:585]
[2020-Mar-29 17:39:30.484327](info):getConsumerIdList from broker:10.10.10.11:10911[findConsumerIds:1103] [2020-Mar-29 17:39:30.517014](error):could not find clientId from Broker[allocate:67] [2020-Mar-29 17:39:30.517047](warning):allocated queue is empty for topic:browser[updateRequestTableInRebalance:479] [2020-Mar-29 17:39:30.517070](info):Drop mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=8],because not responsive[updateRequestTableInRebalance:492] [2020-Mar-29 17:39:30.517093](info):oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=8], its offset is:473946291[updateConsumeOffsetToBroker:310] [2020-Mar-29 17:39:30.549895](info):unlock success of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=8][unlock:335] [2020-Mar-29 17:39:30.549936](info):Drop mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=9],because not responsive[updateRequestTableInRebalance:492] [2020-Mar-29 17:39:30.549967](info):oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=9], its offset is:465195147[updateConsumeOffsetToBroker:310] [2020-Mar-29 17:39:30.582857](info):unlock success of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=9][unlock:335] [2020-Mar-29 17:39:30.582972](info):Drop mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10],because not responsive[updateRequestTableInRebalance:492] [2020-Mar-29 17:39:30.582999](info):oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10], its offset is:469681156[updateConsumeOffsetToBroker:310] [2020-Mar-29 17:39:30.615831](info):unlock success of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10][unlock:335]
[2020-Mar-29 17:39:49.729030](info):CONSUME_FROM_LAST_OFFSET, lastOffset of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10] is:469683314[computePullFromWhere:567] [2020-Mar-29 17:39:49.729068](info):Add mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10], request initial offset:469683314[updateRequestTableInRebalance:527] [2020-Mar-29 17:39:49.896604](info):Start to pull MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10], offset:469683314, GroupName browser_2_sub_6[updateRequestTableInRebalance:538] MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10] [2020-Mar-29 17:39:49.930088](info):LockBatchResponseBody MQ:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10][Decode:89] [2020-Mar-29 17:39:49.930453](info):lock success of mq:MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10][lock:425] [2020-Mar-29 17:39:50.031055](info):Pull request for queue[MessageQueue [topic=browser, brokerName=broker-CuhDxsHLYZA-1, queueId=10]] has been set as dropped. Will NOT pull this queue any more[onSuccess:57]
@zhangyulu @yiitz Could you pls have a review on the PR #293? If the issue still occur, pls feel free to reopen this issue.
在线上又复现了若干次。用的版本是基于commit #b29bf5b69eab187a813a4d7f59534a6375e32d25,之后把PR #300中的变更手动合并进来,日志如下: 389_rocketmq-cpp.log 在03:59:03时出现"wait response timeout of heartbeat" 之后的几个小时里,都会打出 oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=topic_xxx, brokerName=broker-Cluster-kseILdRbPSF-0, queueId=0], its offset is:5255956 这个offset一直没有变化,消费者也一直无法拉取到消息进行消费,直到重启消费者进程才恢复
version: commit b29bf5b69eab187a813a4d7f59534a6375e32d25 + patch #300 consume model: push-orderly
@zhangyulu master分支的消费管理有点乱,你能把你当前使用的源码库发出来吗?或者,你也可以尝试切换到rd_dev分支。
@zhangyulu master分支的消费管理有点乱,你能把你当前使用的源码库发出来吗?或者,你也可以尝试切换到rd_dev分支。
@ifplusor rocketmq-client-cpp-debug.tar.gz
@ShannonDing 能否帮忙看一下这个问题呢
@zhangyulu 你这个问题我之前看了,从已有的日志里没有分析出问题的原因,建议换 re_dev 分支的版本看看效果。
re_dev分支稳定性如何?另外昨天试了一下,部分api不兼容(如setSessionCredentials),需要修改代码
re_dev分支稳定性如何?另外昨天试了一下,部分api不兼容(如setSessionCredentials),需要修改代码
比master稳定很多。re_dev的api重新设计了,所以不兼容。但后面可能还会加一层专门用于向后兼容的wrapper,社区正在评估。
刚刚试了一下,用re_dev分支编译出来example里面的OrderlyConsumer去消费一个topic,一直拉不到消息,但使用PushConsumer就能够正常消费 8348_rocketmq-cpp.log
@zhangyulu RebalanceImpl::lock() 方法有bug,我已经改了。
@zhouxinyu 调用下面这个命令看一下线程栈,PID是你的进程号
gdb -ex "thread apply all bt" -batch -p $PID
@zhangyulu 我做了一些测试,但没办法复现你的遇到的问题。在你的日志里,负载均衡、mq加锁、offset上报都是正常的。你可以先检查一下参数设置和consumer group的积压情况,并在DefaultMQPushConsumerImpl::executePullRequestImmediately()
函数里加一些日志,验证消息拉取是否正常。
嗨,你好,我也遇到了一摸一样的问题,版本2.2.0,consumer数量发生变化时,PushConsumer在Rebalance时部分queue卡住 @ifplusor @ShannonDing
这是某个queue被卡住时的日志
2021-Jun-30 11:18:00.194443:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971080[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:05.192152:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971082[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:10.194912:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971084[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:15.192283:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971087[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:20.194508:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971087[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:25.192197:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971088[updateConsumeOffsetToBroker:310] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] 2021-Jun-30 11:18:30.194282:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971090[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:30.891084:Drop mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3],because not responsive[updateRequestTableInRebalance:493] 2021-Jun-30 11:18:30.891111:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:30.932469:CONSUME_FROM_LAST_OFFSET, lastOffset of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] is:7971091[computePullFromWhere:568] 2021-Jun-30 11:18:30.932487:Add mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], request initial offset:7971091[updateRequestTableInRebalance:528] 2021-Jun-30 11:18:30.932550:Start to pull MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], offset:7971091, GroupName note-change-media-v2-consumer[updateRequestTableInRebalance:539] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] 2021-Jun-30 11:18:30.939869:Drop mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3],because not responsive[updateRequestTableInRebalance:493] 2021-Jun-30 11:18:30.939894:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:31.384323:CONSUME_FROM_LAST_OFFSET, lastOffset of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] is:7971091[computePullFromWhere:568] 2021-Jun-30 11:18:31.384341:Add mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], request initial offset:7971091[updateRequestTableInRebalance:528] 2021-Jun-30 11:18:31.384408:Start to pull MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], offset:7971091, GroupName note-change-media-v2-consumer[updateRequestTableInRebalance:539] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] 2021-Jun-30 11:18:31.401570:Drop mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3],because not responsive[updateRequestTableInRebalance:493] 2021-Jun-30 11:18:31.401596:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:31.422630:CONSUME_FROM_LAST_OFFSET, lastOffset of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] is:7971091[computePullFromWhere:568] 2021-Jun-30 11:18:31.422653:Add mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], request initial offset:7971091[updateRequestTableInRebalance:528] 2021-Jun-30 11:18:31.422711:Start to pull MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], offset:7971091, GroupName note-change-media-v2-consumer[updateRequestTableInRebalance:539] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] 2021-Jun-30 11:18:31.561148:Pull request for queue[MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3]] has been set as dropped. Will NOT pull this queue any more[onSuccess:58] 2021-Jun-30 11:18:31.561140:Pull request for queue[MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3]] has been set as dropped. Will NOT pull this queue any more[onSuccess:58] 2021-Jun-30 11:18:31.561183:Pull request for queue[MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3]] has been set as dropped. Will NOT pull this queue any more[onSuccess:58] 2021-Jun-30 11:18:31.561214:Pull request for queue[MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3]] has been set as dropped. Will NOT pull this queue any more[onSuccess:58] 2021-Jun-30 11:18:35.191912:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3] 2021-Jun-30 11:18:40.194706:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:45.192151:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:50.194279:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310] 2021-Jun-30 11:18:55.192147:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3], its offset is:7971091[updateConsumeOffsetToBroker:310]
基本可以稳定复现,请问有什么推荐的解决方法吗?
并且在接下来的几个小时里,会一直报 2021-Jun-30 14:31:20.195386:PullRequest for [MessageQueue [topic=note-change, brokerName=cluster02-broker10, queueId=3]] has been expired 9987 ms,m_lastPullTimestamp = 1625034550208 ms[isPullRequestExpired:211] 的超时错误,然后逐渐恢复了。。
然后,"任意一个消费者均多次出现过上述情况,必须重启消费者才能恢复"。
我也是,日志也一样,@YellowOldOdd @ifplusor @ShannonDing
嗨,你好,我也遇到了一摸一样的问题,版本2.2.0,consumer数量发生变化时,PushConsumer在Rebalance时部分queue卡住
@ifplusor @ShannonDing
你是什么消费模式?
我也是,日志也一样,@YellowOldOdd @ifplusor @ShannonDing
你是什么消费模式?
我也是,日志也一样,@YellowOldOdd @ifplusor @ShannonDing
你是什么消费模式?
就是PushConsumer,我一共有4个broker,每个broker8个队列,启动了30个消费者,这个问题必现。跟@YellowOldOdd的一模一样,我看master代码已经修改了 #5629226997fea8bc6ac91e9481d2140d983106a7,应该就是这个问题,能不能release一个2.2.1呢,这个问题太严重了
@ShannonDing
嗨 @gaoyf , 请问用了#5629226997fea8bc6ac91e9481d2140d983106a7的改动后,问题有解决吗?
请问这个问题解决了吗 ?
see #300
我们也遇到了这个问题,最后基于master分支重新编译解决了(re-dev分支编译过,但是还是有问题,因为没有ISSUE293这个commit)
commit 5629226997fea8bc6ac91e9481d2140d983106a7 解决队列堵塞问题后,导致消费者重复消费
我也是,日志也一样,@YellowOldOdd @ifplusor @ShannonDing
你是什么消费模式?
就是PushConsumer,我一共有4个broker,每个broker8个队列,启动了30个消费者,这个问题必现。跟@YellowOldOdd的一模一样,我看master代码已经修改了 #5629226997fea8bc6ac91e9481d2140d983106a7,应该就是这个问题,能不能release一个2.2.1呢,这个问题太严重了
@ShannonDing 能不能release一个2.2.1呢
What did you do (The steps to reproduce)? let consumer run
What did you expect to see? keep consuming messages
What did you see instead? stopped consuming messages
Please tell us about your environment:
What is your OS? CentOS Linux 7.6.1810
What is your client version? commit hash: b29bf5b69eab187a813a4d7f59534a6375e32d25 commited at 2020.03.23
What is your RocketMQ version? V4_3_0
Other information (e.g. detailed explanation, logs, related issues, suggestions on how to fix, etc): rocketmq log: 2020-Mar-28 08:42:42.124297:unlock success of mq:MessageQueue [topic=browser, brokerName=broker-1, queueId=1][unlock:335] 2020-Mar-28 08:42:42.124426:PullRequest for [MessageQueue [topic=browser, brokerName=broker-1, queueId=2]] has been expired 9111 ms,m_lastPullTimestamp = 1585356033013 ms[isPullRequestExpired:211] 2020-Mar-28 08:42:42.124486:Drop mq:MessageQueue [topic=browser, brokerName=broker-1, queueId=2] according Pull timeout.[updateRequestTableInRebalance:501] 2020-Mar-28 08:42:42.124554:oneway updateConsumeOffsetToBroker of mq:MessageQueue [topic=browser, brokerName=broker-1, queueId=2], its offset is:268305037[updateConsumeOffsetToBroker:310]
日志一直循环出现上述内容。 只出现在跨机房消费的场景(ping延迟40ms左右),网络可能会有波动但从未断开。每次消费者启动后可以正常运行一段时间(几个小时或者几天),但在某个时间点突然不再消费新消息,导致该消费者所分配到的队列堆积,也不会再分配给其他正常消费者。超时设置(try lock timeout & connect timeout)均使用默认值。 共部署4-6个消费者,任意一个消费者均多次出现过上述情况,必须重启消费者才能恢复。