apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.54k stars 26.44k forks source link

[Bug] The client receives an application offline change notification, but channel.close is not triggered 100% of the time #14938

Open TerryLam2010 opened 1 day ago

TerryLam2010 commented 1 day ago

Pre-check

Search before asking

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

Dubbo Java: 3.2.3 JDK8 Use K8s

Steps to reproduce this issue

Provider and Consumer config:

other config property: dubbo.registry.parameters.register-consumer-url : true dubbo.application.serialize-check-status : DISABLE dubbo.application.enable-empty-protection : true dubbo.provider.prefer-serialization : hessian2 dubbo.provider.serialization : hessian2 I'm from dubbo 2.7.8 upgrade to 3.2.3, found that when the provider referrals, consumers occasionally trigger org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask rewiring (only once), The provider itself closes the channel. But I've looked through the code, and if nacos is notified to go offline, the client will destroy the old DubboInvoker, triggering the channel shutdown and leaving the ReconnectTimerTask empty. My situation now is that I see the log org.apache.dubbo.registry.client.ServiceDiscoveryRegistryDirectory class has come in, also found that destroy the invoker corresponding log, However, the Close netty channel log did not find that although the application offline notification came, it did not close the channel. Because in the process of gradual launch, the logs I give you are all test environments, all in the 3.2.3 version. The production environment is basically 2.7.8, and only one application is 3.2.3 —————————————————————————————————————————————————— 我是从dubbo 2.7.8升级到3.2.3,发现当提供者下线的时候,消费者偶尔触发org.apache.dubbo.remoting.exchange.support.header.ReconnectTimerTask的重连(仅一次),是因为provider自己关闭channel了。但我浏览过代码,如果通知了nacos下线,client会将旧的DubboInvoker销毁,从而触发channel关闭以及将ReconnectTimerTask的任务置空。 现在我的情况是,我看日志org.apache.dubbo.registry.client.ServiceDiscoveryRegistryDirectory类有进来,也发现destroy invoker对应的日志,但是Close netty channel 的日志并没有发现,那就是虽然应用下线通知过来了,但是并没有关闭channel。 ### What you expected to happen I think org.apache.dubbo.rpc.protocol.dubbo.ReferenceCountExchangeClientcounter problem, lead to can not close the channel. Because my local online and offline is normal, only in the test environment and online environment is repeated, just look at the log can not see the problem ### Anything else 2024-11-22T14:53:36,348+0800 [DUBBO] destroy invoker[DefaultServiceInstance{serviceName='order', host='172.29.89.45', port=11060, enabled=true, healthy=true, metadata={dubbo.endpoints=[{"port":11060,"protocol":"dubbo"}], dubbo.metadata.revision=e9aa0faa8181f81a787a5ce1362ce1c7, dubbo.metadata.storage-type=remote, timestamp=1732256698869}}, service{name='com.xxx.xxx.client.rpc.OrderRuleCheckRpcService',group='uat',version='null',protocol='dubbo',port='11060',params={side=provider, heartbeat=30000, release=3.2.3, methods=check,checkAndThrowEx, logger=slf4j, deprecated=false, dubbo=2.0.2, threads=20, interface=com.xxxx.xxxx.client.rpc.OrderRuleCheckRpcService, service-name-mapping=true, threadpool=cached, timeout=1000, generic=false, revision=1.0.0, serialize.check.status=DISABLE, serialization=hessian2, retries=0, metadata-type=remote, application=order, prefer.serialization=hessian2, dynamic=true, enable-empty-protection=true, group=uat},}] success. , dubbo version: 3.2.3, current host: 172.29.119.195 2024-11-22T14:53:36,348+0800 [DUBBO] 1 deprecated invokers deleted., dubbo version: 3.2.3, current host: 172.29.119.195 2024-11-22T14:53:36,349+0800 [DUBBO] serviceKey:uat/com.xxxx.xxxx.client.rpc.OrderRuleCheckRpcService Instance address size 2, interface address size 3, threshold 0.0, dubbo version: 3.2.3, current host: 172.29.119.195 2024-11-22T14:53:36,349+0800 [DUBBO] Received invokers changed event from registry. Registry type: instance. Service Key: uat/com.xxxx.xxxx.client.rpc.OrderRuleCheckRpcService. Urls Size : 2. Invokers Size : 2. Available Size: 2. Available Invokers : 172.29.92.255:11060,172.29.89.9:11060, dubbo version: 3.2.3, current host: 172.29.119.195 It does not make sense that these logs are triggered, but Close Netty channel does not occur. 触发了这些日志,但是并没有发生Close Netty channel, 这并不合理。 ### Are you willing to submit a pull request to fix on your own? - [X] Yes I am willing to submit a pull request on my own! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
laywin commented 16 hours ago

Normally it will be also notifyed by RegistryDirectory, because it is dual registration(interface & instance). when all of them finished destory, the count of ReferenceCountExchangeClient will be zero, so it will be destory.

TerryLam2010 commented 11 hours ago

@laywin That is now the Destory of the ServiceDisCoveryRegistrydirectory, but the Destroyunovokers triggered by the registryDirectory did not print the destruction log. That is, nowadays, all INVOKEER destroyed. ReferenceCountexchangeClient's counter is not 0, and it will not netty client close. How should I check why it is not ruined? Intersection Normally, ServiceDiscoveryRegistryDirectory and RegistryDirectory will be triggered? Or is there any ready -made case for reference? —————————————————————————————————————————————————————— 那现在就是触发了ServiceDiscoveryRegistryDirectory的destory,但是在RegistryDirectory触发的destroyUnusedInvokers 却没有打印销毁日志。那就是现在没有调用完所有invokeer销毁,ReferenceCountExchangeClient的计数器不为0,就不会netty client close. 我应该怎么排查为什么会没有销毁完呢??正常来说ServiceDiscoveryRegistryDirectory和RegistryDirectory都会被触发吧?或者有什么现成的案例提供参考?

laywin commented 9 hours ago

when provider shutdown, in consumer side you can watch the network connect (netstat -apn | grep 20880)