Closed dpad85 closed 11 months ago
It's very surprising that pings to the electrum server timeout but we still receive block, it looks like TCP sockets are only inactive on the sending side (probably some OS-related weirdness).
1, The Peer connection should not stay in ESTABLISHING state so long.
It does reflect reality though, as we are trying to connect but not succeeding yet? If we added a timeout, we would just end up DISCONNECTED
instead of ESTABLISHING
. Would this be a better UX for the user? Do you want that because you can then trigger the reconnection from the Phoenix side once you get a notification from the OS that connectivity is back?
It would fix the 20 seconds timeout though, and it's trivial to implement, so it's probably worth adding anyway.
- Electrum connection soldiers on and remains ESTABLISHED, even when pings fail. Pings have no effect.
I'm not sure what we should do here. Pings are only here to help the TCP connection stay alive (it's just a TCP keep-alive): if they fail but the connection is still alive, we have achieved our goal, so there's nothing to do? If we decided to disconnect when pings timeout, then we would most likely fail to reconnect: the result is that electrum would also appear disconnected. Is that what we want? I see why we could want that for UX reasons (even though it's wasteful to disconnect and reconnect when we have a viable TCP connection).
Another issue with automatically disconnecting on pings timeout is that I don't know if electrum servers guarantee that they will answer to ping messages. If we start disconnecting valid connections that don't respond to ping but correctly respond to everything else, that would really be bad...
it looks like TCP sockets are only inactive on the sending side (probably some OS-related weirdness).
Could be, the connectivity issue started when I uploaded a large log file from my device to Gmail.
If we added a timeout, we would just end up
DISCONNECTED
instead ofESTABLISHING
. Would this be a better UX for the user?
Yes I think so. I don't know exactly why, but connection attempts sometimes hang for a while. During that time, the UI is stuck, it looks like a bug. Forcing reconnection (by restarting the app) usually fixes the issue. A good UX would be:
^ this would need work on Phoenix's UI as well
I'm not sure what we should do here. Pings are only here to help the TCP connection stay alive (it's just a TCP keep-alive): if they fail but the connection is still alive, we have achieved our goal, so there's nothing to do?
I agree, I'm 100% fine with keeping the Electrum connection alive as long as possible. However, pinging Electrum seems unreliable, so we could:
info
or debug
.Yes I think so. I don't know exactly why, but connection attempts sometimes hang for a while. During that time, the UI is stuck, it looks like a bug. Forcing reconnection (by restarting the app) usually fixes the issue. A good UX would be:
Sounds good to me. The timeout (5s) should be provided by Phoenix, so that Phoenix can check whether the user has activated Tor or not and adapt the timeout value accordingly. Ideally if the OS gives you details about the connection quality, that should also be taken into account when choosing what timeout value to use.
decrease the log level of ping errors to info or debug.
Definitely, we should do that!
I've been able to reproduce a connection issue on my mainnet Android device where
Peer
is lostElectrum
is not. We still receive the new blocks (and even receive them right on time, I was checking in real time), however pings do time-out.I've edit the logs so they are very easy to read: https://bin.idrix.fr/?365fbd76444eadc4#wcppPRpmo4YaVRze82nU4bCuYURoPeqvo6scuxbccSo
There are several issues:
ESTABLISHING
state so long. Note that when that the internet connection is fixed, the peer only gets disconnected 20s later (probably by the remote peer) and only then we try to reconnect. This makes for a frustrating UX.ESTABLISHED
, even when pings fail. Pings have no effect.Another issue that will have to be fixed in the Phoenix repo:
AppConnectionDaemon
connection loop is not working as expected (the pause should be reset at some point, instead of incrementing mindlessly: in the end when all the issues above were fixed, this loop still waited for 2s which was not useful).