dragonwell-project / dragonwell8

Alibaba Dragonwell8 JDK
http://dragonwell-jdk.io
GNU General Public License v2.0
4.18k stars 492 forks source link

[Bug] 使用协程后,运行一段时间后,rpc 服务调用远程服务异常后,服务层耗时增加,无法正常提供服务 #265

Open chulong opened 2 years ago

chulong commented 2 years ago

Description 使用 Alibaba Dragonwell 8.4.4 版本,发现服务运行一段时间后,服务无法正常提供服务,怀疑和 #226 问题一致,但是不确定是哪个线程引起无法正常切换协程

chulong commented 2 years ago

stack.txt

joeyleeeeeee97 commented 2 years ago

@chulong Hi, 请问是否使用了黑名单机制将 NIOHandler-TimeOut-Thread-pool-8-tid-4 不转为协程?

从jstack输出来看

"NIOHandler-TimeOut-Thread-pool-8-tid-4" #1005 prio=5 os_prio=0 tid=0x00007fba440e7a30 nid=0x5ca waiting on condition [0x00007fb9ce351000]

这种线程未被转为协程

chulong commented 2 years ago

是的,根据 #226 的解决方案,我设置了黑名单,但是发现未切换的线程出现耗时严重,怀疑抢不到资源

henrysternc commented 2 years ago

使用dragonwell 8.9.10运行一段时间后出现Rocketmq消费停止现象,我在消费MQ消息的业务逻辑处理上使用了线程池与多线程,是不是因为这些业务线程都以协程的方式跑在同一个线程上导致了业务线程饥饿而使MQ的消费线程阻塞,如果是线程饥饿引起的,如何把业务线程均匀分布到不同的协程组线程上呢?

yuleil commented 2 years ago

可以提供一下jstack吗?

使用dragonwell 8.9.10运行一段时间后出现Rocketmq消费停止现象,我在消费MQ消息的业务逻辑处理上使用了线程池与多线程,是不是因为这些业务线程都以协程的方式跑在同一个线程上导致了业务线程饥饿而使MQ的消费线程阻塞,如果是线程饥饿引起的,如何把业务线程均匀分布到不同的协程组线程上呢?

henrysternc commented 2 years ago

大佬您好,这是前几天的jstack快照文件 ,因为我这边业务可以分为多个子任务,子任务又可以分为多个子任务,所以存在多个线程池以处理不同的子任务避免造成线程饥饿.但jdk改为咱们的dragonwell之后发现运行一段时间后业务会阻塞,看了jstack快照发现业务线程全跑到一个协程组了,咱们的业务线程与协程工作线程是多对多的关系吧?所以我怀疑是线程饥饿导致的阻塞. 如果是,咱们的dragonwell协程调度有相关的解决方案吗?比如设定某一线程组的任务指定到一个协程组?

从 Windows 版邮件https://go.microsoft.com/fwlink/?LinkId=550986发送

发件人: @.> 发送时间: 2021年12月30日 17:06 收件人: @.> 抄送: @.>; @.> 主题: Re: [alibaba/dragonwell8] [Bug] 使用协程后,运行一段时间后,rpc 服务调用远程服务异常后,服务层耗时增加,无法正常提供服务 (Issue #265)

可以提供一下jstack吗?

使用dragonwell 8.9.10运行一段时间后出现Rocketmq消费停止现象,我在消费MQ消息的业务逻辑处理上使用了线程池与多线程,是不是因为这些业务线程都以协程的方式跑在同一个线程上导致了业务线程饥饿而使MQ的消费线程阻塞,如果是线程饥饿引起的,如何把业务线程均匀分布到不同的协程组线程上呢?

― Reply to this email directly, view it on GitHubhttps://github.com/alibaba/dragonwell8/issues/265#issuecomment-1002935752, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFMMTNRMMMPKV63ATMTI54TUTQOJXANCNFSM5GSIR52Q. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>

2021-12-29 18:00:12 Full thread dump OpenJDK 64-Bit Server VM (25.312-b01 mixed mode):

"Attach Listener" #173 daemon prio=9 os_prio=0 tid=0x00007fd514002700 nid=0x49 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"DestroyJavaVM" #142 prio=5 os_prio=0 tid=0x00007fd598009d40 nid=0x7 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Wisp-Prevent-Shutdown-2" #41 prio=5 os_prio=0 tid=0x00007fd5996e6bb0 nid=0x2a runnable [0x00007fd50bafb000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method)

"Service Thread" #20 daemon prio=9 os_prio=0 tid=0x00007fd598217ac0 nid=0x26 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C1 CompilerThread3" #19 daemon prio=9 os_prio=0 tid=0x00007fd598215990 nid=0x25 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread2" #18 daemon prio=9 os_prio=0 tid=0x00007fd59820be50 nid=0x24 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread1" #17 daemon prio=9 os_prio=0 tid=0x00007fd59820a300 nid=0x23 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"C2 CompilerThread0" #16 daemon prio=9 os_prio=0 tid=0x00007fd5982087b0 nid=0x22 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Signal Dispatcher" #15 daemon prio=9 os_prio=0 tid=0x00007fd598206bb0 nid=0x21 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Surrogate Locker Thread (Concurrent GC)" #14 daemon prio=9 os_prio=0 tid=0x00007fd5982052e0 nid=0x20 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

Locked ownable synchronizers:

"Wisp-Root-Worker-0" #4 daemon prio=5 os_prio=0 tid=0x00007fd598203a90 nid=0x1f runnable [0x00007fd4efb72000] java.lang.Thread.State: RUNNABLE at io.netty.channel.epoll.Native.epollWait(Native Method) at io.netty.channel.epoll.Native.epollWait(Native.java:129) at io.netty.channel.epoll.Native.epollWait(Native.java:122) at io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange(EpollEventLoop.java:290) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:347) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:853) at com.alibaba.wisp.engine.WispTask.runOutsideWisp(WispTask.java:299) at com.alibaba.wisp.engine.WispTask.runCommand(WispTask.java:274) at com.alibaba.wisp.engine.WispTask.access$100(WispTask.java:53) at com.alibaba.wisp.engine.WispTask$CacheableCoroutine.run(WispTask.java:241) at java.dyn.CoroutineBase.startInternal(CoroutineBase.java:62)

"Wisp-Root-Worker-1" #5 daemon prio=5 os_prio=0 tid=0x00007fd598202240 nid=0x1e runnable [0x00007fd560755000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park0(Native Method) at sun.misc.Unsafe.access$200(Unsafe.java:45) at sun.misc.Unsafe$1.park0(Unsafe.java:65) at com.alibaba.wisp.engine.WispScheduler$Worker.doParkOrPolling(WispScheduler.java:188) at com.alibaba.wisp.engine.WispScheduler$Worker.runCarrier(WispScheduler.java:170) at com.alibaba.wisp.engine.WispScheduler$Worker.run(WispScheduler.java:141) at java.lang.Thread.run(Thread.java:853)

"Wisp-Root-Worker-2" #6 daemon prio=5 os_prio=0 tid=0x00007fd5982009f0 nid=0x1d waiting on condition [0x00007fd560856000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park0(Native Method) at sun.misc.Unsafe.access$200(Unsafe.java:45) at sun.misc.Unsafe$1.park0(Unsafe.java:65) at com.alibaba.wisp.engine.WispScheduler$Worker.doParkOrPolling(WispScheduler.java:188) at com.alibaba.wisp.engine.WispScheduler$Worker.runCarrier(WispScheduler.java:170) at com.alibaba.wisp.engine.WispScheduler$Worker.run(WispScheduler.java:141) at java.lang.Thread.run(Thread.java:853)