apache / dubbo

The java implementation of Apache Dubbo. An RPC and microservice framework.
https://dubbo.apache.org/
Apache License 2.0
40.51k stars 26.43k forks source link

dubbo3.1.5版本BUG!!!,启动是出现线程死锁导致启动服务失败 #12620

Closed gulang12 closed 1 year ago

gulang12 commented 1 year ago

升级dubbo3遇到问题,项目里使用@PostConstruct注解,并且方法里调用的是使用了@Scheduled注解的方法,造成线程死锁

日志信息:

"Thread-54" #617 daemon prio=5 os_prio=0 tid=0x00007f55ca668000 nid=0x26d waiting for monitor entry [0x00007f5554af6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.dubbo.config.spring.ReferenceBean.getCallProxy(ReferenceBean.java:351)  【【【dubbo3.1.5版本引出的问题】】】】
    - waiting to lock <0x000000069092aa18> (a java.util.concurrent.ConcurrentHashMap)
    at org.apache.dubbo.config.spring.ReferenceBean.access$100(ReferenceBean.java:100)
    at org.apache.dubbo.config.spring.ReferenceBean$DubboReferenceLazyInitTargetSource.createObject(ReferenceBean.java:359)
    at org.springframework.aop.target.AbstractLazyCreationTargetSource.getTarget(AbstractLazyCreationTargetSource.java:86)
    - locked <0x00000005d7866948> (a org.apache.dubbo.config.spring.ReferenceBean$DubboReferenceLazyInitTargetSource)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:192)
    at com.sun.proxy.$Proxy73.queryAllData(Unknown Source)
    at com.huasheng.core.client.basicData.EtfDataServiceClient.queryAllData$original$f6OsWsrc(EtfDataServiceClient.java:41)
    at com.huasheng.core.client.basicData.EtfDataServiceClient.queryAllData$original$f6OsWsrc$accessor$ggthnbjE(EtfDataServiceClient.java)
    at com.huasheng.core.client.basicData.EtfDataServiceClient$auxiliary$x5YqXG7G.call(Unknown Source)
    at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
    at com.huasheng.core.client.basicData.EtfDataServiceClient.queryAllData(EtfDataServiceClient.java)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob.refreshindexEtf$original$UrAXt9j7(HqBasicDataRefreshJob.java:1013)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob.refreshindexEtf$original$UrAXt9j7$accessor$WzXLmL3m(HqBasicDataRefreshJob.java)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob$auxiliary$J3bNxn2m.call(Unknown Source)
    at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob.refreshindexEtf(HqBasicDataRefreshJob.java)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob.lambda$indexEtfJob$4(HqBasicDataRefreshJob.java:1006)
    at com.huasheng.task.job.hk.HqBasicDataRefreshJob$$Lambda$452/1550461532.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
    - None
songxiaosheng commented 1 year ago

那个锁被哪个线程占用了

tang2992 commented 1 year ago

我也遇到了这个问题

Found one Java-level deadlock:
=============================
"hsTaskScheduler-1":
  waiting to lock monitor 0x00000242c43b94a8 (object 0x00000006c4b96140, a org.apache.dubbo.config.deploy.DefaultModuleDeployer),
  which is held by "main"
"main":
  waiting to lock monitor 0x00000242c583e248 (object 0x00000006c3a75060, a java.util.concurrent.ConcurrentHashMap),
  which is held by "hsTaskScheduler-1"

Java stack information for the threads listed above:
===================================================
"hsTaskScheduler-1":
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.startSync(DefaultModuleDeployer.java:143)
        - waiting to lock <0x00000006c4b96140> (a org.apache.dubbo.config.deploy.DefaultModuleDeployer)
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.start(DefaultModuleDeployer.java:139)
        at org.apache.dubbo.config.ReferenceConfig.get(ReferenceConfig.java:228)
        at org.apache.dubbo.config.spring.ReferenceBean.getCallProxy(ReferenceBean.java:351)
        - locked <0x00000006c3a75060> (a java.util.concurrent.ConcurrentHashMap)
        at org.apache.dubbo.config.spring.ReferenceBean.access$100(ReferenceBean.java:100)
        at org.apache.dubbo.config.spring.ReferenceBean$DubboReferenceLazyInitTargetSource.createObject(ReferenceBean.java:359)
        at org.springframework.aop.target.AbstractLazyCreationTargetSource.getTarget(AbstractLazyCreationTargetSource.java:86)
        - locked <0x00000006c61d99e8> (a org.apache.dubbo.config.spring.ReferenceBean$DubboReferenceLazyInitTargetSource)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:192)
        at com.sun.proxy.$Proxy105.query(Unknown Source)
        at com.huasheng.platform.client.service.app.config.AppConfigReadServiceClient.query(AppConfigReadServiceClient.java:57)
        at com.huasheng.stock.search.task.LoadStockDefinitionTaskV2.execute(LoadStockDefinitionTaskV2.java:72)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
        at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
"main":
        at org.springframework.context.event.AbstractApplicationEventMulticaster.getApplicationListeners(AbstractApplicationEventMulticaster.java:187)
        - waiting to lock <0x00000006c3a75060> (a java.util.concurrent.ConcurrentHashMap)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:128)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:393)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:347)
        at org.apache.dubbo.config.spring.ServiceBean.publishExportEvent(ServiceBean.java:133)
        at org.apache.dubbo.config.spring.ServiceBean.exported(ServiceBean.java:125)
        at org.apache.dubbo.config.ServiceConfig.doExport(ServiceConfig.java:392)
        - locked <0x000000078da98a88> (a org.apache.dubbo.config.spring.ServiceBean)
        at org.apache.dubbo.config.ServiceConfig.export(ServiceConfig.java:243)
        - locked <0x000000078da98a88> (a org.apache.dubbo.config.spring.ServiceBean)
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.exportServiceInternal(DefaultModuleDeployer.java:350)
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.exportServices(DefaultModuleDeployer.java:322)
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.startSync(DefaultModuleDeployer.java:158)
        - locked <0x00000006c4b96140> (a org.apache.dubbo.config.deploy.DefaultModuleDeployer)
        at org.apache.dubbo.config.deploy.DefaultModuleDeployer.start(DefaultModuleDeployer.java:139)
        at org.apache.dubbo.config.spring.context.DubboDeployApplicationListener.onContextRefreshedEvent(DubboDeployApplicationListener.java:113)
        at org.apache.dubbo.config.spring.context.DubboDeployApplicationListener.onApplicationEvent(DubboDeployApplicationListener.java:102)
        at org.apache.dubbo.config.spring.context.DubboDeployApplicationListener.onApplicationEvent(DubboDeployApplicationListener.java:47)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:172)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:165)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:139)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:393)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:347)
        at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:883)
        at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.finishRefresh(EmbeddedWebApplicationContext.java:146)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:545)
        - locked <0x00000006c3c8e710> (a java.lang.Object)
        at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:124)
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:693)
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:360)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:303)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1118)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1107)
        at com.huasheng.sns.server.RobotApplication.main(RobotApplication.java:26)
songxiaosheng commented 1 year ago

两把锁 一个线程先拿Spring的Map锁,再拿Dubbo的Deployer锁,一个线程先拿Dubbo的Deployer锁,再拿Spring的Map锁

tang2992 commented 1 year ago

@PostConstruct和@scheduled都是spring的常规代码,有办法解决避免这个死锁吗?

liufeiyu1002 commented 1 year ago

@PostConstruct和@scheduled都是spring的常规代码,有办法解决避免这个死锁吗?

不要在初始化方法里调用 dubbo 接口

AlbumenJ commented 1 year ago

Spring 启动有几个阶段

  1. 加载全部的 definition -> Dubbo 在这个时间获取全量的配置
  2. 加载 ApplicationListener -> Dubbo 的 Listener 初始化
  3. 发布 EarlyEvent -> Dubbo 在这个时候初始化配置

如果在阶段 2 发起了 Dubbo 调用会存在死锁

这个修改的原因是如果在阶段 1 直接初始化了所有配置,很可能存在:

  1. Registry、Application 等配置没有全量加载,部分服务发布、订阅非预期
  2. Reference 的服务早于 Service 的服务初始化,导致本地循环依赖的服务启动 check 失败
gulang12 commented 1 year ago

Spring 启动有几个阶段

  1. 加载全部的 definition -> Dubbo 在这个时间获取全量的配置
  2. 加载 ApplicationListener -> Dubbo 的 Listener 初始化
  3. 发布 EarlyEvent -> Dubbo 在这个时候初始化配置

如果在阶段 2 发起了 Dubbo 调用会存在死锁

这个修改的原因是如果在阶段 1 直接初始化了所有配置,很可能存在:

  1. Registry、Application 等配置没有全量加载,部分服务发布、订阅非预期
  2. Reference 的服务早于 Service 的服务初始化,导致本地循环依赖的服务启动 check 失败

这个在3.1.5具体怎么解决,有解决方案吗,因为我们之前调研版本的时候在官网看到3.1.5是稳定版本了,所以现在所有应用都在升级3.1.5版本,这个问题不解决没法升级

AlbumenJ commented 1 year ago

Spring 启动有几个阶段

  1. 加载全部的 definition -> Dubbo 在这个时间获取全量的配置
  2. 加载 ApplicationListener -> Dubbo 的 Listener 初始化
  3. 发布 EarlyEvent -> Dubbo 在这个时候初始化配置

如果在阶段 2 发起了 Dubbo 调用会存在死锁 这个修改的原因是如果在阶段 1 直接初始化了所有配置,很可能存在:

  1. Registry、Application 等配置没有全量加载,部分服务发布、订阅非预期
  2. Reference 的服务早于 Service 的服务初始化,导致本地循环依赖的服务启动 check 失败

这个在3.1.5具体怎么解决,有解决方案吗,因为我们之前调研版本的时候在官网看到3.1.5是稳定版本了,所以现在所有应用都在升级3.1.5版本,这个问题不解决没法升级

对于子任务中使用 Dubbo 调用的等待应用完全启动后再发起调用