Closed tank1314 closed 4 years ago
先说结论:如果仅仅是broker与nameserver的连接断了,实际上不影响客户端的使用。
从你的日志上看,实际上不仅仅是断连的问题,看起来是网络或者是服务有问题:
org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <rocketmq-namesrv-prod01.cloud.bz/xx.xx.xx.xx:9876> timeout, 6000(ms)
所以本身需求来说是符合需求的,只是现在不是断连接的问题,更像是一个节点故障的问题。
@Jaskey 1、 出现问题的时候,该broker节点网络出口流量出现了网络陡增的情况(内网集群平时网络流量200M左右,出问题时,流量陡增至600M左右),同时该时刻点,该节点的CPU也增加到25%左右; 2、该异常总共就出现过1次,再后续的探测轮询时间点,都恢复了正常。观察该时间点的GC日志,没检测出什么异常来。
@tank1314 按照你的信息判断,实际上问题和nameserver断链没有关系,而是你内部机器的问题导致影响了你的写入性能,在这种情况下,有问题是正常的。如果你有多组broker,可以通过熔断来实现暂时屏蔽有问题的broker。但是如果是整体所有集群的broker都有类似问题,这是无法避免的。建议检查下你们部署上是否有考虑各种容灾的情况。
@tank1314 I will close the Issue first, but please feel free to reopen it if you have any other issues.
The issue tracker is ONLY used for bug report(feature request need to follow RIP process). Keep in mind, please check whether there is an existing same report before your raise a new one.
Alternately (especially if your communication is not a bug report), you can send mail to our mailing lists. We welcome any friendly suggestions, bug fixes, collaboration and other improvements.
Please ensure that your bug report is clear and that it is complete. Otherwise, we may be unable to understand it or to reproduce it, either of which would prevent us from fixing the bug. We strongly recommend the report(bug report or feature request) could include some hints as the following:
BUG REPORT
Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc): brokerA主节点的日志信息: timeout, 3000(ms)
at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:292) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating]
at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:338) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating]
at org.apache.rocketmq.broker.out.BrokerOuterAPI.getAllConsumerOffset(BrokerOuterAPI.java:254) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating]
at org.apache.rocketmq.broker.slave.SlaveSynchronize.syncConsumerOffset(SlaveSynchronize.java:84) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating]
at org.apache.rocketmq.broker.slave.SlaveSynchronize.syncAll(SlaveSynchronize.java:50) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating]
2020-09-10 16:42:03 WARN BrokerControllerScheduledThread1 - registerBroker Exception, rocketmq-namesrv-prod01.cloud.bz:9876 org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <ocketmq-namesrv-prod01.cloud.bz/xx.x.xx.xx:9876> timeout, 6000(ms) at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:292) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:338) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBroker(BrokerOuterAPI.java:167) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBrokerAll(BrokerOuterAPI.java:116) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.BrokerController.registerBrokerAll(BrokerController.java:674) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.BrokerController$9.run(BrokerController.java:643) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_131] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_131] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_131] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] 2020-09-10 16:42:05 INFO BrokerControllerScheduledThread1 - register broker to name server rocketmq-namesrv-prod02.cloud.bz:9876 OK brokerA 从节点的日志信息 2020-09-10 16:42:05 INFO ClientManageThread_131 - subscription changed, group: MQ_nike_receive_consumer_group OLD: SubscriptionData [classFilterMode=false, topic=MQ_LSS_PUSH_NIKE_PRO, subString=, tagsSet=[], codeSet=[], subVersion=1599115705119] NEW: SubscriptionData [classFilterMode=false, topic=MQ_LSS_PUSH_NIKE_PRO, subString=, tagsSet=[], codeSet=[], subVersion=1599115711804] 2020-09-10 16:42:08 ERROR BrokerControllerScheduledThread1 - SyncConsumerOffset Exception, 1xx.xx.xx.xx:20911 org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] 客户端日志信息: org.apache.rocketmq.remoting.exception.RemotingTooMuchRequestException: sendDefaultImpl call timeout at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendDefaultImpl(DefaultMQProducerImpl.java:634) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1279) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1225) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.producer.DefaultMQProducer.send(DefaultMQProducer.java:283) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at com.baozun.scm.baseservice.message.rocketmq.service.server.RocketMQProducerServer.sendDataMsgConcurrently(RocketMQProducerServer.java:167) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.ecs.oms4.ofa.manager.receive.ReceiveBaseManager.sendConsumptionFeedback(ReceiveBaseManager.java:264) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl.receiveRefundPayment(ReceiveRefundPaymentManagerImpl.java:59) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$FastClassBySpringCGLIB$$f205e702.invoke() [classes!/:1.0.0-SNAPSHOT]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$EnhancerBySpringCGLIB$$33377ac5.receiveRefundPayment() [classes!/:1.0.0-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor579.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231]
at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl.businessProcessNoTransacation(MsgTranscationManagerImpl.java:187) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$FastClassBySpringCGLIB$$6e604685.invoke() [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$EnhancerBySpringCGLIB$$f0d0e32e.businessProcessNoTransacation() [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteBusinessMsg(MessageHandler.java:303) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteHandle(MessageHandler.java:268) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.handle(MessageHandler.java:197) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5(RocketMQConcurrentlyConsumerInit.java:270) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5$accessor$50njQtqm(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1$auxiliary$N718zRwF.call(Unknown Source) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:93) [skywalking-agent.jar:6.4.0]
at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?]
at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417) [rocketmq-client-4.4.1.8.jar!/:4.4.1.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_231]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_231]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
FEATURE REQUEST
Please describe the feature you are requesting.
Provide any additional detail on your proposed use case for this feature. 期望:当broker与namesrv连接恢复后,不影响客户端正常使用,不影响从节点同步主节点数据。
Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue? 不知是服务端版本有问题还是什么原因。
If there are some sub-tasks using -[] for each subtask and create a corresponding issue to map to the sub task: