Open julianknutsen opened 4 years ago
The seednodes should not intentionally disconnect nodes. We should find out why that is the case (as you stated they are probably overloaded - or maybe its at the restart process.... ). With logs from the seed node it should be possible to find the matching requests and disconnection. I agree we should handle such a close reason with starting a new request inside RequestDataManager on such disconnects.
@wiz Could you have a look at this issue?
Related to: #3797 #1023 #3763 #3347 #2549
this is a tricky one.
CloseConnectionMessage
s are terminated silently in Connection.java
. Therefore, there is only useful messages in the remaining onMessage tree which keeps the rest of the app unaware that something went wrong.
Making a change to this behavior (in order to address this issue) might result in huge testing efforts.
I suggest a wont-fix as we have other countermeasures coming up. @ripcurlx @sqrrm?
Can you add monitoring for the number of connections on each seednode and display it on grafana?
I expect the number of connections for each seednode to be close to their maximum (50 or something?). Simply because we only have 8 seednodes and each client fires up 2 connections -> 200 clients can be connected to the seednodes at any point in time. If another client comes along, the seednode decides which connection to close according to the rules.
However, you might be able to grep a connections open. Our limit is
on the logs of a live seednode just to see if my expectation is reasonable.
Description
During startup, the Bisq client connects to 2 seednodes to sync the initial date. In the event that one seednode closes the connection w/ CLOSE_REQUESTED_BY_PEER and the other times out, the client hangs at (3/4) for the full 90s timeout window.
There isn't much that can be done w.r.t. timeouts, but if a seednode closes the connection intentionally, we can do a better job of sending another request in its place in the event the "backup" connection is going to time out.
I'm not sure if we track metrics on seednode intentional disconnects, but this might be a sign of overloaded seednodes.
Version
v1.2.4
Steps to reproduce
I haven't fabricated this scenario for testing, but in my normal usage of Bisq there have been a few instances where (3/4) will hang for the full 90s before cycling to another seed node and starting. The logs indicate the error described above.
Expected behaviour
If a seednode intentionally disconnects a starting client, the client will retry with another valid node "immediately".
Actual behaviour
Connections intentionally closed by the seednode are not restarted with another valid seednode.
Screenshots
Device or machine
Ubuntu 19.10
Additional info
I don't have a lot of experience with the networking code, but it looks like
RequestDataManager::onDisconnect
only removes the handler from the internal map and doesn't trigger any action.Dec-16 09:24:25.738 - CLOSE_REQUESTED_BY_PEER (Seednode 1) Dec-16 09:25:40.094 - Timeout (Seednode 2) Dec-16 09:25:43.691 - Retry