apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.42k stars 26.41k forks source link

Consumer always reconnect destroyed provider #9623

Closed leimbag closed 2 years ago

leimbag commented 2 years ago

Environment

已搜过相同的issue: #8755 , #8756 ,但是在3.0.5版本又出现了,试了一下dubbo 2.7.15是没有这个问题的。

Steps to reproduce this issue

  1. 开启生产者,消费者
  2. 消费者正常调用生产者,无问题
  3. 停掉生产者,消费者会每隔60s左右重连一次。

Expected Behavior

Provider停机后,Consumer中ReconnectTimerTask不再Reconnect

Actual Behavior

Provider停机后,Consumer中ReconnectTimerTask一直对已经销毁的Provider进行Reconnect

2022-01-24 18:44:28,308 [ERROR] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.e.s.h.ReconnectTimerTask.doTask:51 -  [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:59139 -> /172.16.204.201:29811]], dubbo version: 3.0.5, current host: 172.16.204.201
org.apache.dubbo.remoting.RemotingException: client(url: dubbo://172.16.204.201:29811/com.*.dubbo.wallet.service.WalletService?anyhost=true&application=dubbo-wallet-service&background=false&category=providers&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&heartbeat=60000&interface=com.*.dubbo.wallet.service.WalletService&metadata-type=remote&methods=getWallet&path=com.*.dubbo.wallet.service.WalletService&pid=17786&protocol=dubbo&qos.enable=false&release=3.0.5&service-name-mapping=true&side=provider&sticky=false&timeout=60000&timestamp=1643020518766) failed to connect to server /172.16.204.201:29811, error message is:Connection refused: /172.16.204.201:29811
    at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:183) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:202) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:256) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:171) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:651) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:730) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:452) [dubbo-3.0.5.jar:3.0.5]
    at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.16.204.201:29811
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    ... 1 more
zrlw commented 2 years ago

手边没有3.0.5的测试环境,你可以先在consumer的DubboInvoker类的destroyInternal方法设置一下断点,确认一下下列内容:

                        if (closeAll) {  《== 1. 看看provider销毁时,这里是否为true
                            client.closeAll(serverShutdownTimeout);  《==2. 看看client的类型是什么,如果不是ReferenceCountExchangeClient,那就要查一下相关commit和为什么这么修改了。 @AlbumenJ 
                        } else {
                            client.close(serverShutdownTimeout);
                        }
leimbag commented 2 years ago

搭了个项目,github项目地址

操作步骤:

本地启动一个nacos,修改启动项目的bootstrap.properties配置文件

  1. 启动web-demo的WebDemoApplication

  2. 启动dubbo-user-service的DubboUserServiceApplication

  3. 使用http://localhost:19830/user/getUserNameByShutdown?uid=2第一次访问,等待10s返回结果

  4. 第二次访问使用http://localhost:19830/user/getUserNameByShutdown?uid=3,然后立即停止dubbo-user-service,然后页面立马得到结果uid:3

此时状态为web-demo一直在进行

o.a.d.r.e.s.h.ReconnectTimerTask.doTask:51 -  [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:62829 -> /172.16.204.201:29801]], dubbo version: 3.0.5, current host: 172.16.204.201

打的DubboInvoker断点一直没有进入。

直到重新启动dubbo-user-service服务,才进入DubboInvoker断点。

对应DubboInvoker的

                        if (closeAll) {
                            client.closeAll(serverShutdownTimeout);
                        } else {
                            client.close(serverShutdownTimeout);
                        }

期间对应的web-demo的值状态为:

closeAll 一直为false
client 为 HeaderExchangeClient

最后一次为:

closeAll 为 false
client 为 org.apache.dubbo.rpc.protocol.dubbo.ReferenceCountExchangeClient
zrlw commented 2 years ago

如果closeAll一直为false就不会进行实质关闭操作,看起来是3.0.5可能是把注册目录相关代码改出毛病了。 把最后一次client 为 org.apache.dubbo.rpc.protocol.dubbo.ReferenceCountExchangeClient进入DubboInvoker断点的调用堆栈贴一下吧,如果不好发全部,摘出来ServiceDiscoveryRegistryDirectory.java、RegistryDirectory.java、DubboInvoker.java的相关行号也可以。

zrlw commented 2 years ago

看了一下3.0.5的代码,应该是新加的推空保护机制导致的,实例数量为0时保护机制生效了,导致连接未关闭。 需要看看推空机制这里的代码的要怎么改一下了 @chickenlj @AlbumenJ

leimbag commented 2 years ago

有个关键点是,当优雅下线关闭provider时, consumer完全没有进入DubboInvoker内部断点,直到重启provider后才进入consumer的DubboInvoker断点

leimbag commented 2 years ago

dubbo-user-service停机日志

dubbo-user-service优雅停机日志

2022-01-25 17:42:59,324 [INFO] [DubboServerHandler-172.16.204.201:29801-thread-6raceId] (DubboServerHandler-172.16.204.201:29801-thread-6) c.l.d.c.f.GrayHeaderProviderFilter.invoke:34 - [服务提供者]灰度头过滤, attHeader=null, tagHeader=null, holderHeader=null, interface=com.leimbag.dubbo.user.service.UserService, serviceName=null, methodName=getUserNameByShutdown
Disconnected from the target VM, address: '127.0.0.1:62312', transport: 'socket'
2022-01-25 17:43:01,887 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.DubboShutdownHook.run:59 -  [DUBBO] Run shutdown hook now., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,888 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.m.FrameworkModel.resetDefaultAppModel:277 -  [DUBBO] Reset global default application from Dubbo Application[1.1](dubbo-user-service) to null, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,889 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultApplicationDeployer.onStopping:1063 -  [DUBBO] Dubbo Application[1.1](dubbo-user-service) is stopping., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,902 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopping:285 -  [DUBBO] Dubbo Module[1.1.1] is stopping., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,907 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.r.c.ServiceDiscoveryRegistry.doUnregister:191 -  [DUBBO] The URL[dubbo://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&pid=77991&release=3.0.5&service-name-mapping=true&side=provider&timeout=60000&timestamp=1643102988753] deregistered successfully., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,909 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.r.n.NacosRegistry.unregister:309 -  [DUBBO] Unregister: dubbo://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&pid=77991&release=3.0.5&service-name-mapping=true&side=provider&timeout=60000&timestamp=1643102988753, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,911 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopping:285 -  [DUBBO] Dubbo Module[1.1.0] is stopping., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,917 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopped:295 -  [DUBBO] Dubbo Module[1.1.0] has stopped., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,922 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.s.RegistryManager.destroyAll:98 -  [DUBBO] Close all registries [nacos://10.202.0.52:8992/org.apache.dubbo.registry.RegistryService?application=dubbo-user-service&dubbo=2.0.2&interface=org.apache.dubbo.registry.RegistryService&namespace=service-dev&pid=77991&qos.enable=false&release=3.0.5, nacos://10.202.0.52:8992/org.apache.dubbo.registry.RegistryService?application=dubbo-user-service&dubbo=2.0.2&interface=org.apache.dubbo.registry.RegistryService&namespace=service-dev&pid=77991&qos.enable=false&release=3.0.5], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:01,922 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.destroy:482 -  [DUBBO] Destroy registry:nacos://10.202.0.52:8992/org.apache.dubbo.registry.RegistryService?application=dubbo-user-service&dubbo=2.0.2&interface=org.apache.dubbo.registry.RegistryService&namespace=service-dev&pid=77991&qos.enable=false&release=3.0.5, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,156 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.destroy:491 -  [DUBBO] Destroy unregister url dubbo://172.16.204.201/com.leimbag.dubbo.wallet.service.WalletService?application=dubbo-user-service&background=false&category=consumers&check=false&dubbo=2.0.2&interface=com.leimbag.dubbo.wallet.service.WalletService&metadata-type=remote&methods=getWallet&pid=77991&qos.enable=false&release=3.0.5&side=consumer&sticky=false&timeout=60000&timestamp=1643102990635, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,157 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.unsubscribe:339 -  [DUBBO] Unsubscribe: provider://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&bind.ip=172.16.204.201&bind.port=29801&category=configurators&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&pid=77991&qos.enable=false&release=3.0.5&service-name-mapping=true&side=provider&timeout=60000&timestamp=1643102988753, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,157 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.destroy:507 -  [DUBBO] Destroy unsubscribe url provider://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&bind.ip=172.16.204.201&bind.port=29801&category=configurators&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&pid=77991&qos.enable=false&release=3.0.5&service-name-mapping=true&side=provider&timeout=60000&timestamp=1643102988753, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,157 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.unsubscribe:339 -  [DUBBO] Unsubscribe: dubbo://172.16.204.201/com.leimbag.dubbo.wallet.service.WalletService?application=dubbo-user-service&background=false&category=providers,configurators,routers&check=false&dubbo=2.0.2&interface=com.leimbag.dubbo.wallet.service.WalletService&metadata-type=remote&methods=getWallet&pid=77991&qos.enable=false&release=3.0.5&side=consumer&sticky=false&timeout=60000&timestamp=1643102990635, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,158 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.n.NacosRegistry.destroy:507 -  [DUBBO] Destroy unsubscribe url dubbo://172.16.204.201/com.leimbag.dubbo.wallet.service.WalletService?application=dubbo-user-service&background=false&category=providers,configurators,routers&check=false&dubbo=2.0.2&interface=com.leimbag.dubbo.wallet.service.WalletService&metadata-type=remote&methods=getWallet&pid=77991&qos.enable=false&release=3.0.5&side=consumer&sticky=false&timeout=60000&timestamp=1643102990635, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,159 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.r.n.NacosRegistry.unsubscribe:339 -  [DUBBO] Unsubscribe: dubbo://172.16.204.201/com.leimbag.dubbo.wallet.service.WalletService?application=dubbo-user-service&background=false&category=providers,configurators,routers&check=false&dubbo=2.0.2&interface=com.leimbag.dubbo.wallet.service.WalletService&metadata-type=remote&methods=getWallet&pid=77991&qos.enable=false&release=3.0.5&side=consumer&sticky=false&timeout=60000&timestamp=1643102990635, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,166 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.r.t.n.NettyChannel.close:201 -  [DUBBO] Close netty channel [id: 0xe0e1a277, L:/172.16.204.201:63205 - R:/172.16.204.201:29811], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,168 [INFO] [NettyClientWorker-4-3raceId] (NettyClientWorker-4-3) o.a.d.r.t.n.NettyClientHandler.channelInactive:76 -  [DUBBO] The connection of /172.16.204.201:63205 -> /172.16.204.201:29811 is disconnected., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,172 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopped:295 -  [DUBBO] Dubbo Module[1.1.1] has stopped., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,173 [INFO] [SpringApplicationShutdownHookraceId] (SpringApplicationShutdownHook) o.a.d.c.s.c.DubboSpringInitializer.remove:77 -  [DUBBO] Unbind Dubbo Module[1.1.1] from spring container: org.springframework.beans.factory.support.DefaultListableBeanFactory@193eb1ba, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,480 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.t.m.DefaultExecutorRepository.destroyAll:370 -  [DUBBO] destroying executor repository .., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,481 [ERROR] [DubboServerHandler-172.16.204.201:29801-thread-6raceId] (DubboServerHandler-172.16.204.201:29801-thread-6) c.l.d.u.s.i.UserServiceImpl.getUserNameByShutdown:37 - sleep interrupted
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method) ~[?:?]
    at java.lang.Thread.sleep(Thread.java:340) ~[?:?]
    at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:403) ~[?:?]
    at com.leimbag.dubbo.user.service.impl.UserServiceImpl.getUserNameByShutdown(UserServiceImpl.java:35) [classes/:?]
    at org.apache.dubbo.common.bytecode.Wrapper1.invokeMethod(Wrapper1.java) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.proxy.javassist.JavassistProxyFactory$1.doInvoke(JavassistProxyFactory.java:47) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.proxy.AbstractProxyInvoker.invoke(AbstractProxyInvoker.java:86) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.config.invoker.DelegateProviderMetaDataInvoker.invoke(DelegateProviderMetaDataInvoker.java:55) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:56) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.ClassLoaderCallbackFilter.invoke(ClassLoaderCallbackFilter.java:38) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.protocol.dubbo.filter.TraceFilter.invoke(TraceFilter.java:78) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.TimeoutFilter.invoke(TimeoutFilter.java:46) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:99) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.ExceptionFilter.invoke(ExceptionFilter.java:52) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.auth.filter.ProviderAuthFilter.invoke(ProviderAuthFilter.java:53) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.ContextFilter.invoke(ContextFilter.java:131) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.GenericFilter.invoke(GenericFilter.java:197) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at com.leimbag.dubbo.core.filter.GrayHeaderProviderFilter.invoke(GrayHeaderProviderFilter.java:57) [classes/:?]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.ClassLoaderFilter.invoke(ClassLoaderFilter.java:46) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.filter.EchoFilter.invoke(EchoFilter.java:41) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:313) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CallbackRegistrationInvoker.invoke(FilterChainBuilder.java:187) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.rpc.protocol.dubbo.DubboProtocol$1.reply(DubboProtocol.java:152) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.handleRequest(HeaderExchangeHandler.java:100) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeHandler.received(HeaderExchangeHandler.java:175) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.DecodeHandler.received(DecodeHandler.java:51) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(ChannelEventRunnable.java:57) [dubbo-3.0.5.jar:3.0.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at org.apache.dubbo.common.threadlocal.InternalRunnable.run(InternalRunnable.java:41) [dubbo-3.0.5.jar:3.0.5]
    at java.lang.Thread.run(Thread.java:844) [?:?]
2022-01-25 17:43:02,486 [INFO] [DubboServerHandler-172.16.204.201:29801-thread-6raceId] (DubboServerHandler-172.16.204.201:29801-thread-6) c.l.d.u.s.i.UserServiceImpl.getUserNameByShutdown:39 - 等待十秒, uid=5
2022-01-25 17:43:02,486 [INFO] [DubboServerHandler-172.16.204.201:29801-thread-6raceId] (DubboServerHandler-172.16.204.201:29801-thread-6) c.l.d.c.f.GrayHeaderProviderFilter.invoke:59 - 清除灰度头,grayHeader=null
2022-01-25 17:43:02,488 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultApplicationDeployer.onStopped:1077 -  [DUBBO] Dubbo Application[1.1](dubbo-user-service) has stopped., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,490 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.m.FrameworkModel.onDestroy:104 -  [DUBBO] Destroying default framework model: Dubbo Framework[1], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,491 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.m.FrameworkModel.onDestroy:109 -  [DUBBO] Dubbo Framework[1] is destroying ..., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,491 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultApplicationDeployer.onStopping:1063 -  [DUBBO] Dubbo Application[1.0](DUBBO_INTERNAL_APPLICATION) is stopping., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,491 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopping:285 -  [DUBBO] Dubbo Module[1.0.0] is stopping., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,491 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultModuleDeployer.onModuleStopped:295 -  [DUBBO] Dubbo Module[1.0.0] has stopped., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,492 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.s.RegistryManager.destroyAll:98 -  [DUBBO] Close all registries [], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,492 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.t.m.DefaultExecutorRepository.destroyAll:370 -  [DUBBO] destroying executor repository .., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,492 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.c.d.DefaultApplicationDeployer.onStopped:1077 -  [DUBBO] Dubbo Application[1.0](DUBBO_INTERNAL_APPLICATION) has stopped., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,493 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.p.d.DubboProtocol.destroy:663 -  [DUBBO] Destroying protocol [DubboProtocol] ..., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:02,493 [INFO] [DubboShutdownHookraceId] (DubboShutdownHook) o.a.d.r.p.d.DubboProtocol.destroy:676 -  [DUBBO] Closing dubbo server: /172.16.204.201:29801, dubbo version: 3.0.5, current host: 172.16.204.201

web-demo 对应日志

web-demo 对应日志

2022-01-25 17:43:02,489 [INFO] [http-nio-19830-exec-9raceId] (http-nio-19830-exec-9) c.l.w.d.c.UserController.getUserNameByShutdown:41 - 查询uid=5的用户名, result=TestUser:5
2022-01-25 17:43:17,505 [INFO] [NettyClientWorker-4-8raceId] (NettyClientWorker-4-8) o.a.d.r.t.n.NettyClientHandler.channelInactive:76 -  [DUBBO] The connection of /172.16.204.201:62394 -> /172.16.204.201:29801 is disconnected., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:25,000 [INFO] [nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEventraceId] (nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEvent) o.a.d.r.c.e.l.ServiceInstancesChangedListener.refreshInstance:307 -  [DUBBO] Received instance notification, serviceName: dubbo-user-service, instances: 0, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:25,001 [INFO] [nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEventraceId] (nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEvent) o.a.d.r.c.e.l.ServiceInstancesChangedListener.lambda$notifyAddressChanged$5:440 -  [DUBBO] Notify service com.leimbag.dubbo.user.service.UserService:dubbo with urls 0, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:25,001 [WARN] [nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEventraceId] (nacos.publisher-com.alibaba.nacos.client.naming.event.InstancesChangeEvent) o.a.d.r.c.ServiceDiscoveryRegistryDirectory.refreshInvoker:212 -  [DUBBO] Received empty url list, will ignore for protection purpose., dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:25,137 [WARN] [Dubbo-registry-notification-0-thread-1raceId] (Dubbo-registry-notification-0-thread-1) o.a.d.r.n.NacosRegistry.notify:417 -  [DUBBO] Ignore empty notify urls for subscribe url dubbo://172.16.204.201/com.leimbag.dubbo.user.service.UserService?application=web-demo&background=false&category=providers,configurators,routers&check=false&dubbo=2.0.2&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&pid=6901&qos.enable=false&release=3.0.5&side=consumer&sticky=false&timeout=60000&timestamp=1643083913115, dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:29,528 [INFO] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.e.s.h.ReconnectTimerTask.doTask:48 -  [DUBBO] Initial connection to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:62394 -> /172.16.204.201:29801]], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:29,529 [INFO] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.t.n.NettyChannel.close:201 -  [DUBBO] Close netty channel [id: 0x0d584a3a, L:/172.16.204.201:62394 ! R:/172.16.204.201:29801], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:43:29,535 [ERROR] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.e.s.h.ReconnectTimerTask.doTask:51 -  [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:62394 -> /172.16.204.201:29801]], dubbo version: 3.0.5, current host: 172.16.204.201
org.apache.dubbo.remoting.RemotingException: client(url: dubbo://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&category=providers&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&heartbeat=60000&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&path=com.leimbag.dubbo.user.service.UserService&pid=6790&protocol=dubbo&qos.enable=false&release=3.0.5&service-name-mapping=true&side=provider&sticky=false&timeout=60000&timestamp=1643083877682) failed to connect to server /172.16.204.201:29801, error message is:Connection refused: /172.16.204.201:29801
    at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:183) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:202) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:256) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:171) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:651) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:730) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:452) [dubbo-3.0.5.jar:3.0.5]
    at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.16.204.201:29801
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    ... 1 more
2022-01-25 17:44:30,529 [INFO] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.e.s.h.ReconnectTimerTask.doTask:48 -  [DUBBO] Initial connection to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:62394 -> /172.16.204.201:29801]], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:44:30,531 [INFO] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.t.n.NettyChannel.close:201 -  [DUBBO] Close netty channel [id: 0x0d584a3a, L:/172.16.204.201:62394 ! R:/172.16.204.201:29801], dubbo version: 3.0.5, current host: 172.16.204.201
2022-01-25 17:44:30,535 [ERROR] [dubbo-client-idleCheck-thread-1raceId] (dubbo-client-idleCheck-thread-1) o.a.d.r.e.s.h.ReconnectTimerTask.doTask:51 -  [DUBBO] Fail to connect to HeaderExchangeClient [channel=org.apache.dubbo.remoting.transport.netty4.NettyClient [/172.16.204.201:62394 -> /172.16.204.201:29801]], dubbo version: 3.0.5, current host: 172.16.204.201
org.apache.dubbo.remoting.RemotingException: client(url: dubbo://172.16.204.201:29801/com.leimbag.dubbo.user.service.UserService?anyhost=true&application=dubbo-user-service&background=false&category=providers&check=false&codec=dubbo&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&heartbeat=60000&interface=com.leimbag.dubbo.user.service.UserService&metadata-type=remote&methods=getUserNameByShutdown,getUserName,getWalletBalance&path=com.leimbag.dubbo.user.service.UserService&pid=6790&protocol=dubbo&qos.enable=false&release=3.0.5&service-name-mapping=true&side=provider&sticky=false&timeout=60000&timestamp=1643083877682) failed to connect to server /172.16.204.201:29801, error message is:Connection refused: /172.16.204.201:29801
    at org.apache.dubbo.remoting.transport.netty4.NettyClient.doConnect(NettyClient.java:183) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.connect(AbstractClient.java:202) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.transport.AbstractClient.reconnect(AbstractClient.java:256) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.HeaderExchangeClient.reconnect(HeaderExchangeClient.java:171) ~[dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask.doTask(ReconnectTimerTask.java:49) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.remoting.exchange.support.header.AbstractTimerTask.run(AbstractTimerTask.java:87) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:651) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:730) [dubbo-3.0.5.jar:3.0.5]
    at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:452) [dubbo-3.0.5.jar:3.0.5]
    at java.lang.Thread.run(Thread.java:844) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.16.204.201:29801
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:?]
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.72.Final.jar:4.1.72.Final]
    ... 1 more
leimbag commented 2 years ago

重新启动生产者(dubbo-user-service)后,对应的web-demo循环多次client为HeaderExchangeClient后,执行最后一次client为org.apache.dubbo.rpc.protocol.dubbo.ReferenceCountExchangeClient,对应的调用栈信息如下:

destroyInternal:171, DubboInvoker (org.apache.dubbo.rpc.protocol.dubbo)
destroy:141, DubboInvoker (org.apache.dubbo.rpc.protocol.dubbo)
destroy:89, ListenerInvokerWrapper (org.apache.dubbo.rpc.listener)
destroyAll:50, Invoker (org.apache.dubbo.rpc)
destroyUnusedInvokers:512, RegistryDirectory (org.apache.dubbo.registry.integration)
refreshInvoker:260, RegistryDirectory (org.apache.dubbo.registry.integration)
refreshOverrideAndInvoker:184, RegistryDirectory (org.apache.dubbo.registry.integration)
notify:162, RegistryDirectory (org.apache.dubbo.registry.integration)
notify:440, AbstractRegistry (org.apache.dubbo.registry.support)
doNotify:371, FailbackRegistry (org.apache.dubbo.registry.support)
notify:363, FailbackRegistry (org.apache.dubbo.registry.support)
notifySubscriber:556, NacosRegistry (org.apache.dubbo.registry.nacos)
access$100:88, NacosRegistry (org.apache.dubbo.registry.nacos)
doNotify:656, NacosRegistry$RegistryChildListenerImpl$1 (org.apache.dubbo.registry.nacos)
run:100, RegistryNotifier$NotificationTask (org.apache.dubbo.registry)
call:514, Executors$RunnableAdapter (java.util.concurrent)
run$$$capture:264, FutureTask (java.util.concurrent)
run:-1, FutureTask (java.util.concurrent)
 - Async stack trace
<init>:151, FutureTask (java.util.concurrent)
<init>:215, ScheduledThreadPoolExecutor$ScheduledFutureTask (java.util.concurrent)
schedule:561, ScheduledThreadPoolExecutor (java.util.concurrent)
submit:715, ScheduledThreadPoolExecutor (java.util.concurrent)
submit:694, Executors$DelegatedExecutorService (java.util.concurrent)
notify:77, RegistryNotifier (org.apache.dubbo.registry)
onEvent:665, NacosRegistry$RegistryChildListenerImpl (org.apache.dubbo.registry.nacos)
onEvent:124, InstancesChangeNotifier (com.alibaba.nacos.client.naming.event)
onEvent:40, InstancesChangeNotifier (com.alibaba.nacos.client.naming.event)
lambda$notifySubscriber$0:201, DefaultPublisher (com.alibaba.nacos.common.notify)
notifySubscriber:208, DefaultPublisher (com.alibaba.nacos.common.notify)
receiveEvent:192, DefaultPublisher (com.alibaba.nacos.common.notify)
openEventHandler:118, DefaultPublisher (com.alibaba.nacos.common.notify)
run:95, DefaultPublisher (com.alibaba.nacos.common.notify)
leimbag commented 2 years ago

大致过程是 provider优雅下线后,consumer在接收到下线事件后,会在类org.apache.dubbo.registry.client.ServiceDiscoveryRegistryDirectory的refreshInvoker方法直接返回

            this.forbidden = false; // Allow accessing
            if (CollectionUtils.isEmpty(invokerUrls)) {
                logger.warn("Received empty url list, will ignore for protection purpose.");
                return;
            }

后续就开始无限循环HeaderExchangeClient了,直到provider重新发布启动

这里的保护是为了保护什么情况?

leimbag commented 2 years ago

2.7.15 dubbo走的是类 org.apache.dubbo.registry.integration.RegistryDirectory的refreshInvoker方法,内部调用了destroyAllInvokers(); 可以正常销毁

        if (invokerUrls.size() == 1
                && invokerUrls.get(0) != null
                && EMPTY_PROTOCOL.equals(invokerUrls.get(0).getProtocol())) {
            this.forbidden = true; // Forbid to access
            this.invokers = Collections.emptyList();
            routerChain.setInvokers(this.invokers);
            destroyAllInvokers(); // Close all invokers   <== 执行的这里
        } else {
            Map<URL, Invoker<T>> oldUrlInvokerMap = this.urlInvokerMap; // local reference
            if (invokerUrls == Collections.<URL>emptyList()) {
                invokerUrls = new ArrayList<>();
            }

            ......
        }
zrlw commented 2 years ago

我推测dubbo搞推空保护机制是因为nacos集群扩缩容(增加节点、停止节点)都会出现实例数量抖动甚至直接降为0的问题。 所以nacos搞了个推空保护机制来缓解这个问题,而zk就没有这个东西,看到nacos这样做,dubbo也跟着搞了个推空保护。

我个人觉得本质问题是nacos注册中心代码处理逻辑不完善,比如这两个issue: https://github.com/alibaba/nacos/issues/7697 https://github.com/alibaba/nacos/issues/7687

针对上述问题我们定制了一版nacos,启动、停止nacos服务节点时,其余nacos节点的naming-event日志不会发生变动,换句话说启停节点时,其他节点的注册实例数量将会保持不变,此外新启动的节点完成Distro初始化后(就是从其他一个节点成功拉取全部实例的快照),之后再等待一个心跳超时才参与节点之间的实例同步以及客户端的订阅服务列表推送,确保新启动节点在收全自己负责的实例心跳之后,再推送给客户端、或者同步给nacos集群中的其他节点,防止推送内容不完整。 这方面的修改内容在上面两个issue里对应的PR里有。

leimbag commented 2 years ago

2.7.15就没这个问题,2.7.15会有个protocol为empty://xxxx ,当所有Invoker下线会直接进行destroyAll,但3.0.5却未有这类处理,按理consumer收到下线通知,应该同步本次事件变更,确保Invoker一致性,心跳理论上只是后续补救实例数

leimbag commented 2 years ago

nacos-client-2.0.3的 com.alibaba.nacos.client.naming.core.ServiceInfoUpdateService内部的UpdateTask的run方法,当serviceObj.getLastRefTime() <= lastRefTime会去查询服务对象,此时返回的服务对象的host为0,进而publish的event内部的host列表也是0,最后在consumer上更新实例会因为是空host产生空的instance,consumer端会无法正常下线生产者服务

if (serviceObj.getLastRefTime() <= lastRefTime) {
    serviceObj = namingClientProxy.queryInstancesOfService(serviceName, groupName, clusters, 0, false); <==   此处返回空host列表的服务对象
    serviceInfoHolder.processServiceInfo(serviceObj);
}
chickenlj commented 2 years ago

protection

默认会触发空地址推送保护,目的是为了防止生产环境误下线机器导致服务不可用。

开发环境,可通过配置关闭推空保护,即使不关闭也不影响最终测试效果,只是从空地址异常变为链接异常。

chickenlj commented 2 years ago

大致过程是 provider优雅下线后,consumer在接收到下线事件后,会在类org.apache.dubbo.registry.client.ServiceDiscoveryRegistryDirectory的refreshInvoker方法直接返回

            this.forbidden = false; // Allow accessing
            if (CollectionUtils.isEmpty(invokerUrls)) {
                logger.warn("Received empty url list, will ignore for protection purpose.");
                return;
            }

后续就开始无限循环HeaderExchangeClient了,直到provider重新发布启动

这里的保护是为了保护什么情况?

默认会触发空地址推送保护,目的是为了防止生产环境误下线机器导致服务不可用。

开发环境,可通过配置关闭推空保护,即使不关闭也不影响最终测试效果,只是从空地址异常变为链接异常。

leimbag commented 2 years ago

期望是服务提供者下线,消费者受到消息后不要再重复连接,求问最后是怎么解决的呢?目前一直会输出error日志

zrlw commented 2 years ago

期望是服务提供者下线,消费者受到消息后不要再重复连接,求问最后是怎么解决的呢?目前一直会输出error日志

在所有客户端配置下列选项(关掉推空保护)应该能解决你的问题。

dubbo: 
   registry:
     enable-empty-protection: false
dlmu-lq commented 2 years ago

重新启动生产者(dubbo-user-service)后,对应的web-demo循环多次client为HeaderExchangeClient后,执行最后一次client为org.apache.dubbo.rpc.protocol.dubbo.ReferenceCountExchangeClient,对应的调用栈信息如下:

destroyInternal:171, DubboInvoker (org.apache.dubbo.rpc.protocol.dubbo)
destroy:141, DubboInvoker (org.apache.dubbo.rpc.protocol.dubbo)
destroy:89, ListenerInvokerWrapper (org.apache.dubbo.rpc.listener)
destroyAll:50, Invoker (org.apache.dubbo.rpc)
destroyUnusedInvokers:512, RegistryDirectory (org.apache.dubbo.registry.integration)
refreshInvoker:260, RegistryDirectory (org.apache.dubbo.registry.integration)
refreshOverrideAndInvoker:184, RegistryDirectory (org.apache.dubbo.registry.integration)
notify:162, RegistryDirectory (org.apache.dubbo.registry.integration)
notify:440, AbstractRegistry (org.apache.dubbo.registry.support)
doNotify:371, FailbackRegistry (org.apache.dubbo.registry.support)
notify:363, FailbackRegistry (org.apache.dubbo.registry.support)
notifySubscriber:556, NacosRegistry (org.apache.dubbo.registry.nacos)
access$100:88, NacosRegistry (org.apache.dubbo.registry.nacos)
doNotify:656, NacosRegistry$RegistryChildListenerImpl$1 (org.apache.dubbo.registry.nacos)
run:100, RegistryNotifier$NotificationTask (org.apache.dubbo.registry)
call:514, Executors$RunnableAdapter (java.util.concurrent)
run$$$capture:264, FutureTask (java.util.concurrent)
run:-1, FutureTask (java.util.concurrent)
 - Async stack trace
<init>:151, FutureTask (java.util.concurrent)
<init>:215, ScheduledThreadPoolExecutor$ScheduledFutureTask (java.util.concurrent)
schedule:561, ScheduledThreadPoolExecutor (java.util.concurrent)
submit:715, ScheduledThreadPoolExecutor (java.util.concurrent)
submit:694, Executors$DelegatedExecutorService (java.util.concurrent)
notify:77, RegistryNotifier (org.apache.dubbo.registry)
onEvent:665, NacosRegistry$RegistryChildListenerImpl (org.apache.dubbo.registry.nacos)
onEvent:124, InstancesChangeNotifier (com.alibaba.nacos.client.naming.event)
onEvent:40, InstancesChangeNotifier (com.alibaba.nacos.client.naming.event)
lambda$notifySubscriber$0:201, DefaultPublisher (com.alibaba.nacos.common.notify)
notifySubscriber:208, DefaultPublisher (com.alibaba.nacos.common.notify)
receiveEvent:192, DefaultPublisher (com.alibaba.nacos.common.notify)
openEventHandler:118, DefaultPublisher (com.alibaba.nacos.common.notify)
run:95, DefaultPublisher (com.alibaba.nacos.common.notify)

遇到了相同的问题,这个堆栈用这个配置不行吧 dubbo: registry: enable-empty-protection: false

Invoker.destroyAll方法如果调用到DubboInvoker的destroyAll方法的话是可行的,但是这里中间加了一层ListenerInvokerWrapper,我们的应用堆栈更是增加了以下Invoker的实现:

image

最终导致DubboInvoker.destroyAll方法没有被调用,也就没有真正销毁client,导致不断重连报错。

这已经不再是推空的问题了,而是不推空也会必现的问题了。

大佬有什么解决方法吗,把每个Invoker的实现都单独实现一遍destroyAll方法吗? @zrlw

zrlw commented 2 years ago

我们自己还在用2.7.13,所以dubbo3.0+的此类问题还没有遇到。 从你贴出的堆栈信息看,dubbo3.0+的FilterChainBuilder的CopyOfFilterChainNode、CallbackRegistrationInvoker两个内部类,以及ListenerInvokerWrapper都需要改一下,都需要实现Invoker接口的destroyAll方法,分别调用各自invoker的destroyAll。

zrlw commented 2 years ago

可能你只是遇到了这3个invoker,实现invoker接口的其他invoker是否都需要加上destroyAll方法,这点我并不清楚,只是把我们用到的类加了destroyAll实现。

zrlw commented 2 years ago

@dlmu-lq 你可以提一个PR, 把所有的invoker都加一个destroyAll方法,应该能彻底解决这个问题。我当时看invoker的实现类有点多,所以只是把我们用到的加了一下。

woshizhouqiang commented 1 year ago

@dlmu-lq 你可以提一个PR, 把所有的invoker都加一个destroyAll方法,应该能彻底解决这个问题。我当时看invoker的实现类有点多,所以只是把我们用到的加了一下。

@zrlw 哪个版本有修复这个问题?