Seednode CLOSE_REQUESTED_BY_PEER + Timeout == 90s delay in startup

julianknutsen commented 4 years ago

Description

During startup, the Bisq client connects to 2 seednodes to sync the initial date. In the event that one seednode closes the connection w/ CLOSE_REQUESTED_BY_PEER and the other times out, the client hangs at (3/4) for the full 90s timeout window.

There isn't much that can be done w.r.t. timeouts, but if a seednode closes the connection intentionally, we can do a better job of sending another request in its place in the event the "backup" connection is going to time out.

I'm not sure if we track metrics on seednode intentional disconnects, but this might be a sign of overloaded seednodes.

Version

v1.2.4

Steps to reproduce

I haven't fabricated this scenario for testing, but in my normal usage of Bisq there have been a few instances where (3/4) will hang for the full 90s before cycling to another seed node and starting. The logs indicate the error described above.

Expected behaviour

If a seednode intentionally disconnects a starting client, the client will retry with another valid node "immediately".

Actual behaviour

Connections intentionally closed by the seednode are not restarted with another valid seednode.

Screenshots

Device or machine

Ubuntu 19.10

Additional info

I don't have a lot of experience with the networking code, but it looks like RequestDataManager::onDisconnect only removes the handler from the internal map and doesn't trigger any action.

Dec-16 09:24:25.738 - CLOSE_REQUESTED_BY_PEER (Seednode 1) Dec-16 09:25:40.094 - Timeout (Seednode 2) Dec-16 09:25:43.691 - Retry

Dec-16 09:24:09.937 [JavaFX Application Thread] INFO  b.n.p.p.g.RequestDataHandler: We send a PreliminaryGetDataRequest to peer fl3mmribyxgrv63c.onion:8000.  
Dec-16 09:24:10.083 [JavaFX Application Thread] INFO  b.n.p.p.g.RequestDataHandler: We send a PreliminaryGetDataRequest to peer jhgcy2won7xnslrb.onion:8000.  
Dec-16 09:24:10.084 [ STARTING] INFO  o.b.core.AbstractBlockChain: chain head is at height 608391:
 block: 
   hash: 000000000000000000033aaf6baf36c29cfd1273b4534be10a3f9b5b429e5ef0
   version: 536928256 (BIP34, BIP66, BIP65)
   previous block: 00000000000000000012df7e87203e2570e61a2dd9eb75dd483804e6f3e8ed95
   merkle root: 0f3d965d7a966af8ee45f09e2b1081b299630e29268f4551c9248e52b87e2337
   time: 1576516776 (2019-12-16T17:19:36Z)
   difficulty target (nBits): 387308498
   nonce: 3598059532

Dec-16 09:24:10.326 [ STARTING] INFO  b.core.btc.setup.WalletConfig: We try to connect to 9 btc nodes 
Dec-16 09:24:10.353 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Starting ... 
Dec-16 09:24:10.356 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Localhost peer not detected. 
Dec-16 09:24:10.361 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [fz6nsij6jiyuwlsc.onion]:8333     (0 connected, 1 pending, 9 max) 
Dec-16 09:24:10.363 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [sslnjjhnmwllysv4.onion]:8333     (0 connected, 2 pending, 9 max) 
Dec-16 09:24:10.382 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [jiuuuislm7ooesic.onion]:8333     (0 connected, 3 pending, 9 max) 
Dec-16 09:24:10.384 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [22tg6ufbwz6o3l2u.onion]:8333     (0 connected, 4 pending, 9 max) 
Dec-16 09:24:10.395 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [z33nukt7ngik3cpe.onion]:8333     (0 connected, 5 pending, 9 max) 
Dec-16 09:24:10.421 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [lva54pnbq2nsmjyr.onion]:8333     (0 connected, 6 pending, 9 max) 
Dec-16 09:24:10.432 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [i3a5xtzfm4xwtybd.onion]:8333     (0 connected, 7 pending, 9 max) 
Dec-16 09:24:10.436 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [3r44ddzjitznyahw.onion]:8333     (0 connected, 8 pending, 9 max) 
Dec-16 09:24:10.437 [PeerGroup Thread] INFO  org.bitcoinj.core.PeerGroup: Attempting connection to [mxdtrjhe2yfsx3pg.onion]:8333     (0 connected, 9 pending, 9 max) 
Dec-16 09:24:10.438 [JavaFX Application Thread] INFO  bisq.core.app.BisqSetup: walletInitialized=true, p2pNetWorkReady=false 
Dec-16 09:24:13.856 [BlockingClient network thread for sslnjjhnmwllysv4.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to sslnjjhnmwllysv4.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:14.352 [BlockingClient network thread for sslnjjhnmwllysv4.onion:8333] INFO  org.bitcoinj.core.Peer: [sslnjjhnmwllysv4.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.1/', services=0x1037, time=2019-12-16 09:24:16, blocks=608391 
Dec-16 09:24:14.354 [BlockingClient network thread for sslnjjhnmwllysv4.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [sslnjjhnmwllysv4.onion]:8333: New peer      (1 connected, 8 pending, 9 max) 
Dec-16 09:24:14.355 [BlockingClient network thread for sslnjjhnmwllysv4.onion:8333] INFO  org.bitcoinj.core.PeerGroup: Setting download peer: [sslnjjhnmwllysv4.onion]:8333 
Dec-16 09:24:15.333 [BlockingClient network thread for lva54pnbq2nsmjyr.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to lva54pnbq2nsmjyr.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:15.508 [BlockingClient network thread for jiuuuislm7ooesic.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to jiuuuislm7ooesic.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:15.508 [BlockingClient network thread for i3a5xtzfm4xwtybd.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to i3a5xtzfm4xwtybd.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:15.580 [BlockingClient network thread for z33nukt7ngik3cpe.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to z33nukt7ngik3cpe.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:15.981 [BlockingClient network thread for lva54pnbq2nsmjyr.onion:8333] INFO  org.bitcoinj.core.Peer: [lva54pnbq2nsmjyr.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.1/', services=0x1037, time=2019-12-16 09:24:18, blocks=608391 
Dec-16 09:24:15.982 [BlockingClient network thread for lva54pnbq2nsmjyr.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [lva54pnbq2nsmjyr.onion]:8333: New peer      (2 connected, 7 pending, 9 max) 
Dec-16 09:24:16.130 [BlockingClient network thread for 22tg6ufbwz6o3l2u.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to 22tg6ufbwz6o3l2u.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:16.131 [BlockingClient network thread for i3a5xtzfm4xwtybd.onion:8333] INFO  org.bitcoinj.core.Peer: [i3a5xtzfm4xwtybd.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.1/', services=0x1037, time=2019-12-16 09:24:18, blocks=608391 
Dec-16 09:24:16.135 [BlockingClient network thread for i3a5xtzfm4xwtybd.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [i3a5xtzfm4xwtybd.onion]:8333: New peer      (3 connected, 6 pending, 9 max) 
Dec-16 09:24:16.144 [BlockingClient network thread for z33nukt7ngik3cpe.onion:8333] INFO  org.bitcoinj.core.Peer: [z33nukt7ngik3cpe.onion]:8333: Got version=70015, subVer='/Satoshi:0.16.3/', services=0x1037, time=2019-12-16 09:20:48, blocks=608391 
Dec-16 09:24:16.151 [BlockingClient network thread for z33nukt7ngik3cpe.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [z33nukt7ngik3cpe.onion]:8333: New peer      (4 connected, 5 pending, 9 max) 
Dec-16 09:24:16.193 [NetworkNode:SendMessage-to-jhgcy2won7xnslrb.onion:8000] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.089 kB 
Dec-16 09:24:16.347 [BlockingClient network thread for 3r44ddzjitznyahw.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to 3r44ddzjitznyahw.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:16.369 [NetworkNode:SendMessage-to-fl3mmribyxgrv63c.onion:8000] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.089 kB 
Dec-16 09:24:16.448 [BlockingClient network thread for jiuuuislm7ooesic.onion:8333] INFO  org.bitcoinj.core.Peer: [jiuuuislm7ooesic.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.1/', services=0x1037, time=2019-12-16 09:24:18, blocks=608391 
Dec-16 09:24:16.449 [BlockingClient network thread for jiuuuislm7ooesic.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [jiuuuislm7ooesic.onion]:8333: New peer      (5 connected, 4 pending, 9 max) 
Dec-16 09:24:16.460 [BlockingClient network thread for mxdtrjhe2yfsx3pg.onion:8333] INFO  org.bitcoinj.core.Peer: Announcing to mxdtrjhe2yfsx3pg.onion:8333 as: /bitcoinj:0.14.7.bisq.1-SNAPSHOT/Bisq:1.2.4/ 
Dec-16 09:24:16.484 [pool-39-thread-1] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.089 kB 
Dec-16 09:24:16.534 [pool-41-thread-1] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.089 kB 
Dec-16 09:24:16.851 [BlockingClient network thread for 3r44ddzjitznyahw.onion:8333] INFO  org.bitcoinj.core.Peer: [3r44ddzjitznyahw.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.1/', services=0x1037, time=2019-12-16 09:24:19, blocks=608391 
Dec-16 09:24:16.852 [BlockingClient network thread for 3r44ddzjitznyahw.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [3r44ddzjitznyahw.onion]:8333: New peer      (6 connected, 3 pending, 9 max) 
Dec-16 09:24:16.994 [BlockingClient network thread for mxdtrjhe2yfsx3pg.onion:8333] INFO  org.bitcoinj.core.Peer: [mxdtrjhe2yfsx3pg.onion]:8333: Got version=70015, subVer='/Satoshi:0.18.0/', services=0x1037, time=2019-12-16 09:24:19, blocks=608391 
Dec-16 09:24:16.994 [BlockingClient network thread for mxdtrjhe2yfsx3pg.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [mxdtrjhe2yfsx3pg.onion]:8333: New peer      (7 connected, 2 pending, 9 max) 
Dec-16 09:24:17.101 [BlockingClient network thread for 22tg6ufbwz6o3l2u.onion:8333] INFO  org.bitcoinj.core.Peer: [22tg6ufbwz6o3l2u.onion]:8333: Got version=70015, subVer='/Satoshi:0.19.0.1/', services=0x1037, time=2019-12-16 09:24:19, blocks=608391 
Dec-16 09:24:17.102 [BlockingClient network thread for 22tg6ufbwz6o3l2u.onion:8333] INFO  org.bitcoinj.core.PeerGroup: [22tg6ufbwz6o3l2u.onion]:8333: New peer      (8 connected, 1 pending, 9 max) 
Dec-16 09:24:25.734 [JavaFX Application Thread] INFO  b.n.p2p.peers.PeerManager: onDisconnect called: nodeAddress=Optional[fl3mmribyxgrv63c.onion:8000], closeConnectionReason=CloseConnectionReason{sendCloseMessage=false, isIntended=true} CLOSE_REQUESTED_BY_PEER 
Dec-16 09:24:25.738 [JavaFX Application Thread] INFO  b.n.p.p.p.PeerExchangeManager: onDisconnect closeConnectionReason=CloseConnectionReason{sendCloseMessage=false, isIntended=true} CLOSE_REQUESTED_BY_PEER, nodeAddressOpt=Optional[fl3mmribyxgrv63c.onion:8000] 
Dec-16 09:24:26.191 [JavaFX Application Thread] INFO  b.n.p2p.peers.PeerManager: We have 1 connections open. Our limit is 12 
Dec-16 09:24:43.511 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:24:43.519 [Thread-18] INFO  b.n.p2p.network.TorNetworkNode: 
################################################################
Tor hidden service published after 33917 ms. Socked=HiddenServiceSocket[addr=shavx2j47oghl62o.onion,port=9999]
################################################################ 
Dec-16 09:24:43.992 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:24:44.187 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:24:44.298 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:24:44.472 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:24:45.390 [TorControlParser] INFO  o.b.netlayer.tor.Tor: Hidden Service shavx2j47oghl62o.onion.onion has been announced to the Tor network. 
Dec-16 09:25:40.094 [JavaFX Application Thread] INFO  b.n.p.p.g.RequestDataHandler: A timeout occurred at sending getDataRequest:PreliminaryGetDataRequest(supportedCapabilities=[0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) on nodeAddress:jhgcy2won7xnslrb.onion:8000 
Dec-16 09:25:40.131 [JavaFX Application Thread] INFO  b.n.p.p.g.RequestDataHandler: We send a PreliminaryGetDataRequest to peer 723ljisnynbtdohi.onion:8000.  
Dec-16 09:25:43.691 [NetworkNode:SendMessage-to-723ljisnynbtdohi.onion:8000] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.084 kB 
Dec-16 09:25:43.798 [pool-43-thread-1] INFO  b.n.p.p.g.m.PreliminaryGetDataRequest: Sending a PreliminaryGetDataRequest with 1980.084 kB 
Dec-16 09:25:47.522 [InputHandler-jhgcy2won7xnslrb.onion:8000] INFO  b.n.p.p.g.m.GetDataResponse: Received a GetDataResponse with 3166.148 kB 
Dec-16 09:25:53.682 [JavaFX Application Thread] INFO  b.n.p2p.peers.PeerManager: We have 2 connections open. Our limit is 12 
Dec-16 09:26:02.276 [InputHandler-723ljisnynbtdohi.onion:8000] INFO  b.n.p.p.g.m.GetDataResponse: Received a GetDataResponse with 3521.577 kB 
Dec-16 09:26:02.388 [JavaFX Application Thread] INFO  b.n.p.p.g.RequestDataHandler: 
#################################################################
Connected to node: 723ljisnynbtdohi.onion:8000
Received 892 instances
RefundAgent: 1
Filter: 2
MailboxStoragePayload: 544
Mediator: 2
Alert: 1
OfferPayload: 342
#################################################################

chimp1984 commented 4 years ago

The seednodes should not intentionally disconnect nodes. We should find out why that is the case (as you stated they are probably overloaded - or maybe its at the restart process.... ). With logs from the seed node it should be possible to find the matching requests and disconnection. I agree we should handle such a close reason with starting a new request inside RequestDataManager on such disconnects.

ripcurlx commented 4 years ago

@wiz Could you have a look at this issue?

ripcurlx commented 4 years ago

Related to: #3797 #1023 #3763 #3347 #2549

freimair commented 4 years ago

this is a tricky one.

CloseConnectionMessages are terminated silently in Connection.java. Therefore, there is only useful messages in the remaining onMessage tree which keeps the rest of the app unaware that something went wrong.

Making a change to this behavior (in order to address this issue) might result in huge testing efforts.

I suggest a wont-fix as we have other countermeasures coming up. @ripcurlx @sqrrm?

wiz commented 4 years ago

Can you add monitoring for the number of connections on each seednode and display it on grafana?

freimair commented 4 years ago

I expect the number of connections for each seednode to be close to their maximum (50 or something?). Simply because we only have 8 seednodes and each client fires up 2 connections -> 200 clients can be connected to the seednodes at any point in time. If another client comes along, the seednode decides which connection to close according to the rules.

However, you might be able to grep a connections open. Our limit is on the logs of a live seednode just to see if my expectation is reasonable.

bisq-network / bisq