In a modified version, a deadlock occurred. I think the issue may still exist in standard version.
This deadlock involved two peers. They trying to flush data to each other, and thus blocking data-receiver and so flushing is always failed and failed into a dead loop.
Stack of one peer:
"6881.bt.net.message-dispatcher-10.61.97.171:-1" #63080 daemon prio=5 os_prio=0 tid=0x000000005ed13000 nid=0x14ef1 runnable [0x00007f04c6bcd000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
- locked <0x00000001c9000078> (a java.lang.Object)
at bt.net.pipeline.SocketChannelHandler.flush(SocketChannelHandler.java:148)
- locked <0x0000000298e8e740> (a java.lang.Object)
at bt.net.pipeline.SocketChannelHandler.send(SocketChannelHandler.java:71)
at bt.net.SocketPeerConnection.postMessage(SocketPeerConnection.java:134)
- locked <0x0000000298e8e790> (a bt.net.SocketPeerConnection)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$MessageDispatchingLoop.processSupplier(MultiThreadMessageDispatcher.java:179)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$MessageDispatchingLoop.run(MultiThreadMessageDispatcher.java:108)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher.lambda$createAndSubmitTask$3(MultiThreadMessageDispatcher.java:289)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$$Lambda$746/1406993695.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"6881.bt.runtime.shutdown.worker-4" #63231 daemon prio=5 os_prio=0 tid=0x0000000004fdc800 nid=0x14f8f waiting for monitor entry [0x00007f04d0c49000]
java.lang.Thread.State: BLOCKED (on object monitor)
at bt.net.pipeline.SocketChannelHandler.close(SocketChannelHandler.java:169)
- waiting to lock <0x0000000298e8e740> (a java.lang.Object)
- locked <0x0000000298e8e750> (a java.lang.Object)
at bt.net.SocketPeerConnection.close(SocketPeerConnection.java:166)
at bt.net.SocketPeerConnection.closeQuietly(SocketPeerConnection.java:154)
at bt.net.PeerConnectionPool$$Lambda$845/1275417506.accept(Unknown Source)
at bt.net.Connections$$Lambda$799/596951729.accept(Unknown Source)
at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
at bt.net.Connections.visitConnections(PeerConnectionPool.java:334)
at bt.net.PeerConnectionPool.shutdown(PeerConnectionPool.java:270)
at bt.net.PeerConnectionPool$$Lambda$630/585840387.run(Unknown Source)
at bt.runtime.BtRuntime.lambda$toRunnable$7(BtRuntime.java:324)
at bt.runtime.BtRuntime$$Lambda$678/1139650115.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Stack of anther peer:
"6881.bt.net.data-receiver" #64560 prio=5 os_prio=0 tid=0x000000004a832000 nid=0x158c7 waiting for monitor entry [0x00007f18d6109000]
java.lang.Thread.State: BLOCKED (on object monitor)
at bt.net.pipeline.SocketChannelHandler.processInboundData(SocketChannelHandler.java:115)
- waiting to lock <0x00000001979717f8> (a java.lang.Object)
at bt.net.pipeline.SocketChannelHandler.read(SocketChannelHandler.java:82)
at bt.net.pipeline.DefaultChannelPipeline$DefaultChannelHandlerContext.readFromChannel(DefaultChannelPipeline.java:181)
at bt.net.DataReceivingLoop.processKey(DataReceivingLoop.java:187)
at bt.net.DataReceivingLoop.run(DataReceivingLoop.java:128)
at bt.net.DataReceivingLoop.lambda$null$1(DataReceivingLoop.java:65)
at bt.net.DataReceivingLoop$$Lambda$679/99718958.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"6881.bt.net.pool.cleaner" #64562 prio=5 os_prio=0 tid=0x000000003897d800 nid=0x158c9 waiting for monitor entry [0x00007f18d840b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at bt.net.pipeline.SocketChannelHandler.close(SocketChannelHandler.java:169)
- waiting to lock <0x00000001aff3dc48> (a java.lang.Object)
- locked <0x00000001979717f8> (a java.lang.Object)
at bt.net.SocketPeerConnection.close(SocketPeerConnection.java:166)
at bt.net.SocketPeerConnection.closeQuietly(SocketPeerConnection.java:154)
at bt.net.PeerConnectionPool.purgeConnection(PeerConnectionPool.java:264)
at bt.net.PeerConnectionPool.access$200(PeerConnectionPool.java:48)
at bt.net.PeerConnectionPool$Cleaner.lambda$run$0(PeerConnectionPool.java:249)
at bt.net.PeerConnectionPool$Cleaner$$Lambda$795/1375733778.accept(Unknown Source)
at bt.net.Connections$$Lambda$796/1447003786.accept(Unknown Source)
at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)
at bt.net.Connections.visitConnections(PeerConnectionPool.java:334)
at bt.net.PeerConnectionPool$Cleaner.run(PeerConnectionPool.java:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"6881.bt.net.message-dispatcher-10.61.96.233:6881" #64794 daemon prio=5 os_prio=0 tid=0x0000000056929000 nid=0x159af runnable [0x00007f18c7e1e000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
- locked <0x00000001aff3dbf8> (a java.lang.Object) writeLock
at bt.net.pipeline.SocketChannelHandler.flush(SocketChannelHandler.java:148)
- locked <0x00000001aff3dc48> (a java.lang.Object) outboundBufferLock
at bt.net.pipeline.SocketChannelHandler.send(SocketChannelHandler.java:71)
at bt.net.SocketPeerConnection.postMessage(SocketPeerConnection.java:134)
- locked <0x00000001aff3dc88> (a bt.net.SocketPeerConnection) synchronized
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$MessageDispatchingLoop.processSupplier(MultiThreadMessageDispatcher.java:179)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$MessageDispatchingLoop.run(MultiThreadMessageDispatcher.java:108)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher.lambda$createAndSubmitTask$3(MultiThreadMessageDispatcher.java:289)
at com.ctrip.flight.intl.common.bt.MultiThreadMessageDispatcher$$Lambda$743/1741032101.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
In both peers, message-dispatcher thread fall in a dead loop:
while (buffer.hasRemaining()) {
channel.write(buffer);
}
possibly because tcp write buffer is full, and because remote peer's tcp read buffer is full, and because remote peers's data-receiver thread is blocked.
In a modified version, a deadlock occurred. I think the issue may still exist in standard version. This deadlock involved two peers. They trying to flush data to each other, and thus blocking
data-receiver
and so flushing is always failed and failed into a dead loop.Stack of one peer:
Stack of anther peer:
In both peers,
message-dispatcher
thread fall in a dead loop:possibly because tcp write buffer is full, and because remote peer's tcp read buffer is full, and because remote peers's
data-receiver
thread is blocked.