atomashpolskiy / bt

BitTorrent library and client with DHT, magnet links, encryption and more
https://atomashpolskiy.github.io/bt/
Apache License 2.0
2.4k stars 382 forks source link

Shared executor issue #207

Closed NyannCat closed 2 years ago

NyannCat commented 2 years ago

In our swarm environment, we found that sometimes Bt suffers from Handshake failure issue. As we analyzed the log and stack dump, we found out that, due to unknown reason, the connection pool was filled by the outgoing tasks and there was no room for the incoming tasks.

Notices the time between first line and second line, the incoming task was added at 13:23:53. However, it waited almost 5 minutes for execution. In this case, the client (10.112.190.23) had already close the connection due to timeout(30s timeout).

I think the cause of this issue is the mixture usage of the executor pool for both incoming and outgoing connections. Once the pool is filled by either incoming or outgoing connections, the other type of connection will suffer from huger, Bt loss the ability to establish connection. In the worest case, the entire swarm loss the ability to connect to each other.

To avoid this issue, i suggest that we should assign incoming connection a dedicated pool.

13:23:53.332 [6890.bt.net.pool.incoming-acceptor-0] INFO  bt.net.IncomingConnectionListener.establishConnection:87 - submit in connection: /10.112.190.23:43199, queue size: 1030
13:29:11.211 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.IncomingConnectionListener.lambda$establishConnection$2:91 - start in connection: /10.112.190.23:43199
13:29:11.211 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.PeerConnectionFactory.createIncomingConnection:165 - createIncomingConnection, InetPeer{address=/10.112.190.23, port=-1}
13:29:11.212 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.PeerConnectionFactory.createConnection:173 - _createConnection
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.IncomingHandshakeHandler.handleConnection:60 - handleConnection readMessage
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.IncomingHandshakeHandler.handleConnection:73 - handleConnection firstMessage
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.IncomingHandshakeHandler.handleConnection:78 - handleConnection torrentId: 5f030f01cfa2d92dbffcac6616b4219ed77b04fb
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] INFO  bt.net.IncomingHandshakeHandler.handleConnection:89 - handleConnection postMessage
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] ERROR bt.net.PeerConnectionFactory.createConnection:177 - Failed to establish connection with peer: InetPeer{address=/10.112.190.23, port=-1}. Reason: java.lang.RuntimeException (Unexpected I/O error)
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] ERROR bt.net.buffer.BufferManager$DefaultBorrowedBuffer.release:81 - release buffer, 495d1a0067dc46c596b34507dca06ddb
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] DEBUG bt.net.buffer.BufferManager$DefaultBorrowedBuffer.release:92 - buffer id 495d1a0067dc46c596b34507dca06ddb has been released, buffer is null: true
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] ERROR bt.net.buffer.BufferManager$DefaultBorrowedBuffer.release:81 - release buffer, e651444745f64d3e9d21a41a17eebf5b
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] DEBUG bt.net.buffer.BufferManager$DefaultBorrowedBuffer.release:92 - buffer id e651444745f64d3e9d21a41a17eebf5b has been released, buffer is null: true
13:29:11.214 [6890.bt.net.pool.connection-worker--in-/10.112.190.23:43199] DEBUG bt.net.IncomingConnectionListener.lambda$establishConnection$2:105 - connection not added, will be cancelled /10.112.190.23:43199
"6890.bt.net.pool.connection-worker--out-/10.57.168.76:6890-5262dc06557d83785d1b20958721f427c1d8b1e3" #357 daemon prio=5 os_prio=0 tid=0x0000562a56657000 nid=0xf63b waiting on condition [0x00007fe041855000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007e649ccd8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at bt.net.SocketPeerConnection.readMessage(SocketPeerConnection.java:102)
        - locked <0x00000007e649cc58> (a bt.net.SocketPeerConnection)
        at bt.net.OutgoingHandshakeHandler.handleConnection(OutgoingHandshakeHandler.java:75)
        at bt.net.PeerConnectionFactory.initConnection(PeerConnectionFactory.java:282)
        at bt.net.PeerConnectionFactory._createConnection(PeerConnectionFactory.java:229)
        at bt.net.PeerConnectionFactory.createConnection(PeerConnectionFactory.java:174)
        at bt.net.PeerConnectionFactory.createOutgoingConnection(PeerConnectionFactory.java:148)
        at bt.net.ConnectionSource.lambda$getConnectionAsync$0(ConnectionSource.java:143)
        at bt.net.ConnectionSource$$Lambda$919/1570099463.get(Unknown Source)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at com.dianping.cat.async.CatWrappedRunnable.lambda$run$0(CatWrappedRunnable.java:30)
        at com.dianping.cat.async.CatWrappedRunnable$$Lambda$131/656160573.execute(Unknown Source)
        at com.dianping.cat.async.CatAsyncSupportTemplate.doExecute4Simple(CatAsyncSupportTemplate.java:147)
        at com.dianping.cat.async.CatAsyncSupportTemplate.execute(CatAsyncSupportTemplate.java:94)
        at com.dianping.cat.async.CatWrappedRunnable.run(CatWrappedRunnable.java:29)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

"6890.bt.net.pool.connection-worker--out-/10.57.168.76:6890-13023eab58e59a627977d8e1024de4127cccc5a9" #356 daemon prio=5 os_prio=0 tid=0x0000562a56656000 nid=0xf63a waiting on condition [0x00007fe041896000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007e43afb68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at bt.net.SocketPeerConnection.readMessage(SocketPeerConnection.java:102)
        - locked <0x00000007e43afae8> (a bt.net.SocketPeerConnection)
        at bt.net.OutgoingHandshakeHandler.handleConnection(OutgoingHandshakeHandler.java:75)
        at bt.net.PeerConnectionFactory.initConnection(PeerConnectionFactory.java:282)
        at bt.net.PeerConnectionFactory._createConnection(PeerConnectionFactory.java:229)
        at bt.net.PeerConnectionFactory.createConnection(PeerConnectionFactory.java:174)
        at bt.net.PeerConnectionFactory.createOutgoingConnection(PeerConnectionFactory.java:148)
        at bt.net.ConnectionSource.lambda$getConnectionAsync$0(ConnectionSource.java:143)
        at bt.net.ConnectionSource$$Lambda$919/1570099463.get(Unknown Source)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at com.dianping.cat.async.CatWrappedRunnable.lambda$run$0(CatWrappedRunnable.java:30)
        at com.dianping.cat.async.CatWrappedRunnable$$Lambda$131/656160573.execute(Unknown Source)
        at com.dianping.cat.async.CatAsyncSupportTemplate.doExecute4Simple(CatAsyncSupportTemplate.java:147)
        at com.dianping.cat.async.CatAsyncSupportTemplate.execute(CatAsyncSupportTemplate.java:94)
        at com.dianping.cat.async.CatWrappedRunnable.run(CatWrappedRunnable.java:29)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

"6890.bt.net.pool.connection-worker--out-/10.112.190.23:6890-8eda9206f94df877218ac065756c4d7e6d9cb254" #355 daemon prio=5 os_prio=0 tid=0x0000562a5428e000 nid=0xf639 waiting on condition [0x00007fe0418d7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007e5f97660> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at bt.net.SocketPeerConnection.readMessage(SocketPeerConnection.java:102)
        - locked <0x00000007e5f975e0> (a bt.net.SocketPeerConnection)
        at bt.net.OutgoingHandshakeHandler.handleConnection(OutgoingHandshakeHandler.java:75)
        at bt.net.PeerConnectionFactory.initConnection(PeerConnectionFactory.java:282)
        at bt.net.PeerConnectionFactory._createConnection(PeerConnectionFactory.java:229)
        at bt.net.PeerConnectionFactory.createConnection(PeerConnectionFactory.java:174)
        at bt.net.PeerConnectionFactory.createOutgoingConnection(PeerConnectionFactory.java:148)
        at bt.net.ConnectionSource.lambda$getConnectionAsync$0(ConnectionSource.java:143)
        at bt.net.ConnectionSource$$Lambda$919/1570099463.get(Unknown Source)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at com.dianping.cat.async.CatWrappedRunnable.lambda$run$0(CatWrappedRunnable.java:30)
        at com.dianping.cat.async.CatWrappedRunnable$$Lambda$131/656160573.execute(Unknown Source)
        at com.dianping.cat.async.CatAsyncSupportTemplate.doExecute4Simple(CatAsyncSupportTemplate.java:147)
        at com.dianping.cat.async.CatAsyncSupportTemplate.execute(CatAsyncSupportTemplate.java:94)
        at com.dianping.cat.async.CatWrappedRunnable.run(CatWrappedRunnable.java:29)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

"6890.bt.net.pool.connection-worker--in-/10.113.81.108:35741" #354 daemon prio=5 os_prio=0 tid=0x0000562a53073000 nid=0xf638 waiting for monitor entry [0x00007fe041918000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:205)
        - waiting to lock <0x00000005148684b8> (a java.lang.Object)
        at bt.net.SharedSelector.wakeupAndRegister(SharedSelector.java:117)
        at bt.net.DataReceivingLoop.registerChannel(DataReceivingLoop.java:85)
        at bt.net.pipeline.SocketChannelHandler.register(SocketChannelHandler.java:97)
        at bt.net.PeerConnectionFactory._createConnection(PeerConnectionFactory.java:219)
        at bt.net.PeerConnectionFactory.createConnection(PeerConnectionFactory.java:174)
        at bt.net.PeerConnectionFactory.createIncomingConnection(PeerConnectionFactory.java:166)
        at bt.net.SocketChannelConnectionAcceptor.createConnection(SocketChannelConnectionAcceptor.java:153)
        at bt.net.SocketChannelConnectionAcceptor.access$000(SocketChannelConnectionAcceptor.java:36)
        at bt.net.SocketChannelConnectionAcceptor$1.establish(SocketChannelConnectionAcceptor.java:135)
        at bt.net.IncomingConnectionListener.lambda$establishConnection$2(IncomingConnectionListener.java:94)
        at bt.net.IncomingConnectionListener$$Lambda$712/647561060.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.dianping.cat.async.CatWrappedRunnable.lambda$run$0(CatWrappedRunnable.java:30)
        at com.dianping.cat.async.CatWrappedRunnable$$Lambda$131/656160573.execute(Unknown Source)
        at com.dianping.cat.async.CatAsyncSupportTemplate.doExecute4Simple(CatAsyncSupportTemplate.java:147)
        at com.dianping.cat.async.CatAsyncSupportTemplate.execute(CatAsyncSupportTemplate.java:94)
        at com.dianping.cat.async.CatWrappedRunnable.run(CatWrappedRunnable.java:29)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
atomashpolskiy commented 2 years ago

Thank you! Makes sense. Please add closing of the new incomingConnectionExecutor (the same way as the existing executor is closed above).

NyannCat commented 2 years ago

fixed