Closed saleson closed 2 years ago
如果注册中心存在空地址的情况会走到下面的逻辑进行销毁
private void refreshInvoker(List<URL> invokerUrls) {
Assert.notNull(invokerUrls, "invokerUrls should not be null, use EMPTY url to clear current addresses.");
this.originalUrls = invokerUrls;
if (invokerUrls.size() == 1 && EMPTY_PROTOCOL.equals(invokerUrls.get(0).getProtocol())) {
logger.warn("Received url with EMPTY protocol, will clear all available addresses.");
this.forbidden = true; // Forbid to access
routerChain.setInvokers(BitList.emptyList());
destroyAllInvokers(); // Close all invokers
}
}
即使转订阅为空也会走这里快速返回,不会 destroy
if (CollectionUtils.isEmptyMap(newUrlInvokerMap)) {
logger.error(new IllegalStateException("Cannot create invokers from url address list (total " + invokerUrls.size() + ")"));
return;
}
这个应该是触发了注册中心空地址保护的逻辑了,在注册中心地址为空的时候拿前一次非空的结果进行处理,避免由于注册中心不可用造成的抖动,可以通过以下配置关闭
RegistryConfig.java
@Parameter(key = ENABLE_EMPTY_PROTECTION_KEY)
public Boolean getEnableEmptyProtection() {
return enableEmptyProtection;
}
public void setEnableEmptyProtection(Boolean enableEmptyProtection) {
this.enableEmptyProtection = enableEmptyProtection;
}
对于不可预测的抖动进行空地址保护的逻辑我认为是很棒的,同时我也建议做一些优化处理,避免正常下线被当作抖动而被忽略掉,且是永久忽略。 以下是个人溥见,比如可以添加后续的监控或检测任务,如果时间段内都是注册中心正常可用且确实没有provider,就执行destroy逻辑;或者根据实例下降比例进行判断。
避免正常下线被当作抖动而被忽略掉,且是永久忽略。
这种推空只在所有地址都下线的极端情况下才会出现,对于正常的线上集群实践应该不会出现。如确有需求,可以通过开关把推空保护关闭就好了。
关闭推空保护就相当于降低了容错,在线上虽然出现推空这种情况的概率非常小,但一旦出现就可能会造故障。我想dubbo当时提供推空保护也是出于这种考虑吧。 而且在生产环境这种场景还是会有存在的,比如dubbo 接口下线,有引用的consumer应该是会出现这种情况的。
Environment
Steps to reproduce this issue
1、启动provider 2、启动consumer 3、调用一次RPC 4、停止provider 5、等60秒左右
Pls. provide [GitHub address] to reproduce this issue.
原因: ServiceDiscoveryRegistryDirectory.refreshInvoker()和RegistryDirectory.refreshInvoker()方法中的代码逻辑在本场景中存在问题(逻辑顺序反了)
而destroyAllInvokers()方法中是重新判断urlInvokerMap是否为空,再从urlInvokerMap中获取invoker调用其destroyAll()方法,而此时urlInvokerMap是空,所以invoker不会被destroy
Expected Behavior
释放DubboInvoker
Actual Behavior
没有释放 DubboInvoker及NettyClient等资源,ReconnectTimerTask会不断检查连接状态以及尝试重连然后报错(60秒间隔)
If there is an exception, please attach the exception trace:
[24/04/22 21:18:37:430 CST] dubbo-client-idleCheck-thread-1 ERROR header.ReconnectTimerTask: [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/192.168.1.7:57600 -> /192.168.1.7:20880]], dubbo version: 3.0.8-SNAPSHOT, current host: 192.168.1.7 org.apache.dubbo.remoting.RemotingException: client(url: dubbo://192.168.1.7:20880/org.apache.dubbo.springboot.demo.DemoService?anyhost=true&application=dubbo-springboot-demo-consumer&background=false&category=providers,configurators,routers&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&heartbeat=60000&interface=org.apache.dubbo.springboot.demo.DemoService&methods=sayHello,sayHelloAsync&pid=18417&qos.enable=false®ister-mode=interface&release=3.0.8-SNAPSHOT&side=consumer&sticky=false) failed to connect to server /192.168.1.7:20880, error message is:Connection refused: /192.168.1.7:20880 at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:192) at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:214) at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:268) at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:171) at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49) at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87) at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:651) at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:730) at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:452) at java.lang.Thread.run(Thread.java:750) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.1.7:20880 Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750)