Tencent / TubeMQ

TubeMQ has been donated to the Apache Software Foundation and renamed to InLong, please visit the new Apache repository: https://github.com/apache/incubator-inlong
https://inlong.apache.org/
2.02k stars 391 forks source link

Broker重新注册到Master的时间较长,容易使服务不能尽快恢复。 #93

Closed Libeibei1990 closed 4 years ago

Libeibei1990 commented 4 years ago

建议维持原有心跳周期,或者一定要做降频处理的话,这个时长为2个而不是10个的心跳周期的间隔比较好。

tisonkun commented 4 years ago

Could you describe where the issue is such as pasting a code snippet?

gosonzhang commented 4 years ago

@Libeibei1990 , I understand that you are talking about TubeBroker.java as follows: this.scheduledExecutorService.scheduleAtFixedRate code implementation part

                        if (!shutdown.get()) {
                            long currErrCnt = heartbeatErrors.get();
                            if (currErrCnt > maxReleaseTryCnt) {
                                if ((currErrCnt - maxReleaseTryCnt) % maxReleaseTryCnt != 0) {
                                    heartbeatErrors.incrementAndGet();
                                    return;
                                }
                            }

The initial consideration is that network anomalies should be restored quickly. If non-network anomalies persist for a long time, it is a good measure to reduce the frequency of broker attempts. However, waiting too long is not conducive to the rapid recovery of cluster services after the problem is solved. I plan to make 10 attempts and still not successful, then adjust the cycle to 2 heartbeat cycles and try to connect once. Let's see if there is a problem.


我理解你说的问题来源于TubeBroker.java如上this.scheduledExecutorService.scheduleAtFixedRate代码实现部分。这块最初考虑是网络异常应该很快恢复,如果非网络异常会持续很长时间,降低broker尝试的频度是一个好的措施,不过太长时间的等待确实不利于问题解决后服务快速恢复,计划将周期调整为2个心跳周期尝试一次,看是否有问题