Closed Libeibei1990 closed 4 years ago
Could you describe where the issue is such as pasting a code snippet?
@Libeibei1990 , I understand that you are talking about TubeBroker.java as follows: this.scheduledExecutorService.scheduleAtFixedRate code implementation part
if (!shutdown.get()) {
long currErrCnt = heartbeatErrors.get();
if (currErrCnt > maxReleaseTryCnt) {
if ((currErrCnt - maxReleaseTryCnt) % maxReleaseTryCnt != 0) {
heartbeatErrors.incrementAndGet();
return;
}
}
The initial consideration is that network anomalies should be restored quickly. If non-network anomalies persist for a long time, it is a good measure to reduce the frequency of broker attempts. However, waiting too long is not conducive to the rapid recovery of cluster services after the problem is solved. I plan to make 10 attempts and still not successful, then adjust the cycle to 2 heartbeat cycles and try to connect once. Let's see if there is a problem.
我理解你说的问题来源于TubeBroker.java如上this.scheduledExecutorService.scheduleAtFixedRate代码实现部分。这块最初考虑是网络异常应该很快恢复,如果非网络异常会持续很长时间,降低broker尝试的频度是一个好的措施,不过太长时间的等待确实不利于问题解决后服务快速恢复,计划将周期调整为2个心跳周期尝试一次,看是否有问题
建议维持原有心跳周期,或者一定要做降频处理的话,这个时长为2个而不是10个的心跳周期的间隔比较好。