apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.52k stars 26.43k forks source link

ServiceDiscoveryRegistryDirectory #14254

Closed yipixiaofeiyang closed 5 months ago

yipixiaofeiyang commented 5 months ago

Pre-check

Search before asking

Apache Dubbo Component

Java SDK (apache/dubbo)

Dubbo Version

Dubbo java 3.1.5-> 3.1.6 Jdk 1.7 Centos8 spring boot 2.7.3

Steps to reproduce this issue

This issue only applies to consumers of spring boot+dubbo, and there is no problem with the provider. The consumer service consists of two machines!

Operation steps: The consumer service starts normally first, and then restarts the service using the kill pid (note not -9) . During the startup process, the service keeps requesting the Spring interface,

Result:

[dubbo java-3.1.5] The service can be processed normally

[dubbo java-3.1.6] the service will appear: org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory The detailed error message is as follows

Upgrading to the dubbo java latest version[3.2.13] doesn't work either!

What you expected to happen

org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory already destroyed for service com.myb.saas.data.service.CoalpitService from registry nacos://192.168.100.70:8848/org.apache.dubbo.registry.RegistryService?application=admin&backup=192.168.100.71:8848,192.168.100.72:8848&dubbo=2.0.2&group=dubbo&logger=slf4j&metadata-type=remote&namespace=53a4c18c-0804-4734-8de2-57aa99f27833&pid=2133071&qos.accept.foreign.ip=false&qos.enable=false&register-mode=instance&release=3.1.6&serialize.check.status=WARN&timestamp=1716956513980 at org.apache.dubbo.rpc.cluster.directory.AbstractDirectory.list(AbstractDirectory.java:184) at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.list(AbstractClusterInvoker.java:408) at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:333) at com.alibaba.csp.sentinel.adapter.dubbo3.SentinelDubboConsumerFilter.syncInvoke(SentinelDubboConsumerFilter.java:82) at com.alibaba.csp.sentinel.adapter.dubbo3.SentinelDubboConsumerFilter.invoke(SentinelDubboConsumerFilter.java:66) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.rpc.cluster.router.RouterSnapshotFilter.invoke(RouterSnapshotFilter.java:46) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:100) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:52) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at com.alibaba.csp.sentinel.adapter.dubbo3.DubboAppContextFilter.invoke(DubboAppContextFilter.java:47) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.rpc.cluster.filter.support.ConsumerClassLoaderFilter.invoke(ConsumerClassLoaderFilter.java:40) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.rpc.cluster.filter.support.ConsumerContextFilter.invoke(ConsumerContextFilter.java:120) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CopyOfFilterChainNode.invoke(FilterChainBuilder.java:327) at org.apache.dubbo.rpc.cluster.filter.FilterChainBuilder$CallbackRegistrationInvoker.invoke(FilterChainBuilder.java:194) at org.apache.dubbo.rpc.cluster.support.wrapper.AbstractCluster$ClusterFilterInvoker.invoke(AbstractCluster.java:92) at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:103) at org.apache.dubbo.registry.client.migration.MigrationInvoker.invoke(MigrationInvoker.java:284) at org.apache.dubbo.rpc.proxy.InvocationUtil.invoke(InvocationUtil.java:56) at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:75)

Anything else

No response

Are you willing to submit a pull request to fix on your own?

Code of Conduct

yipixiaofeiyang commented 5 months ago

I don't know if I should consider this to be a bug, but it did happen in the upgraded version and the user experience was not very user-friendly

wcy666103 commented 5 months ago

I don't understand what this is, kill the process will call destroy logic, set a set of identifiers, and then when the business method is called again, the exception will appear

image

I didn't think of any practical significance in this scenario?

yipixiaofeiyang commented 5 months ago

I don't understand what this is, kill the process will call destroy logic, set a set of identifiers, and then when the business method is called again, the exception will appear

image

I didn't think of any practical significance in this scenario?

Kill the running consumer service first, then restart the service. Continuously accessing the Spring interface during this process can cause this problem. Some people online claim that it was caused by the shutdown of the Dubbo service before the Spring service. After comparing the changes in 3.1.5 and 3.1.6, we did not find the reason. Could you please help analyze the question

wcy666103 commented 5 months ago

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

yipixiaofeiyang commented 5 months ago

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

还是飙中文吧! I'm chineses! 就是dubbo消费者服务在重启过程中, 通过postman的方式, 不停的请求spring对外接口 1716967827338 类似于上面图片所示

wcy666103 commented 5 months ago

这。。。好邪门啊,按理说jvm实例停止了再重新启动里边的flag应该都是初试化的状态呀,并且你这个时候不断请求springweb接口有什么意义,他们不是一个jvm吗,不都停了么?我没有复现出来呢还

yipixiaofeiyang commented 5 months ago

What do you mean "continuously accessing the Spring interface during this process" ? I'm trying to reproduce it, can you provide some demo?

还是飙中文吧! I'm chineses! 就是dubbo消费者服务在重启过程中, 通过postman的方式, 不停的请求spring对外接口 1716967827338 类似于上面图片所示

采用的是spring boot + dubbo的方式, 双服务器做负载

yipixiaofeiyang commented 5 months ago

这。。。好邪门啊,按理说jvm实例停止了再重新启动里边的flag应该都是初试化的状态呀,并且你这个时候不断请求springweb接口有什么意义,他们不是一个jvm吗,不都停了么?我没有复现出来呢还

确实邪门, spring服务 kill后, 理论是不会接受外部请求, 应该会负载到另外一台服务器, 所以假设有一部分请求在spring服务shutdown的一瞬间进来了, 如果dubbo服务还在, 确实不应该报错, 怕就怕在dubbo服务在spring服务shutdown之前就destory了, 看网上有这个说法...但3.1.5是正常, 也就是在代码层面确实应该是可以解决的, 头疼

xixingya commented 5 months ago

this may solve your problem, https://cn.dubbo.apache.org/zh-cn/overview/mannual/java-sdk/advanced-features-and-usage/others/graceful-shutdown/

server:
  shutdown: graceful
dubbo:
  application:
    name: dubbo-springboot-demo-provider
    shutwait: 30000
yipixiaofeiyang commented 5 months ago

this may solve your problem, https://cn.dubbo.apache.org/zh-cn/overview/mannual/java-sdk/advanced-features-and-usage/others/graceful-shutdown/

server:
  shutdown: graceful
dubbo:
  application:
    name: dubbo-springboot-demo-provider
    shutwait: 30000

这个很久前试了, 只要dubbo版本>=3.1.6 就会重现 org.apache.dubbo.rpc.RpcException: Directory of type ServiceDiscoveryRegistryDirectory already destroyed for service xxx 还是忍不住又试了下 1716975637336

xixingya commented 5 months ago

@yipixiaofeiyang can you find the keyword in log file keyword:

Run shutdown hook now.

if not, make sure you add the server.shutdown=graceful

xixingya commented 5 months ago

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

yipixiaofeiyang commented 5 months ago

@yipixiaofeiyang can you find the keyword in log file keyword:

Run shutdown hook now.

if not, make sure you add the server.shutdown=graceful

这个是有的, 没啥问题

yipixiaofeiyang commented 5 months ago

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

xixingya commented 5 months ago

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

yipixiaofeiyang commented 5 months ago

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

可以的, 我直接用附件的方式吧 provider-parent.zip

yipixiaofeiyang commented 5 months ago

After comparing the changes in 3.1.5 and 3.1.6, I can not find issue. can you provide a demo to reproduce it.

demo很简单, 一个提供者, 一个消费者, 消费者需要用2台服务器+nginx做负载均衡, 注册中心采用nacos, 部署上稍微麻烦了些

can you upload your demo code on github?

可以的, 我直接用附件的方式吧 provider-parent.zip

服务器上的启动脚本如下, 其实和spring的server.shutdown = graceful效果差不多, 我们生产环境环境一直是这样用的, 在dubbo3.1.6之前是完全没问题的, 我就懒得改成graceful方式了 ` pid=$(ps -ef|grep java| grep ${serverName} |awk '{print $2}'); echo $pid;

waitcount=0 while [ "$pid" != "" ]; do if [ "$waitcount" == "0" ]; then kill $pid fi sleep 1 let "waitcount++" if [ "$waitcount" == "30" ]; then kill -9 $pid fi pid=$(ps -ef|grep java| grep $serverName |awk '{print $2}'); done

nohup /usr/jdk1.8.0_201/bin/java -jar -Xms700m -Xmx700m *.jar >log.txt & `

xixingya commented 5 months ago

after try, graceful shutdown works well at 3.1.5 and works well at 3.1.6, but not works at 3.2.11 and 3.2.13 @yipixiaofeiyang

yipixiaofeiyang commented 5 months ago

after try, graceful shutdown works well at 3.1.5 and works well at 3.1.6, but not works at 3.2.11 and 3.2.13 @yipixiaofeiyang

Okay, then I'll wait for the new version and try again! By adding RpcException capture, it indicates that the service is being upgraded. Please try again later I think the client can understand!

985177520 commented 1 day ago

@xixingya 大佬这个修复了没