This PR fixes/implements the following bugs/features
[x] Use Fibonnci backoff for establishing a connection
[x] Use a keep trying interval when Fibonnaci backoff fails (try every five minutes)
[x] Attempt to reconnect immediately when coming online
[x] Add connection status reporting
Our current implementation uses keep alive to reconnect every 60 seconds. With these changes, Fibonacci backoff will attempt to reconnect sooner if a connect attempt fails and backoff gradually. Once we reach a threshhold of five minutes, it no longer makes sense to backoff further and we keep trying every five minutes.
In addition, on an online event, we attempt to connect immediately.
We report connection status individually and in aggregate in the console. For each connection we report:
Peer multiaddress
connected status
lastConnectedAt timestamp
latency in milliseconds
In aggregate, we report
offline status, true when all peers are not connected
lastConnectedAt, the timestamp of the most recent connection to any peer
average latency across all connected peers
Reporting is on by default on localhost and in the staging environment.
In production, you can start monitoring connections by running monitorPeers in the shared worker console and stop monitoring with stopMonitoringPeers. The shared worker console is in about:debugging#/runtime/this-firefox in Firefox and chrome://inspect/#workers in Chrome. (Scroll way down in Firefox to Shared Workers.)
Test plan (required)
Open the shared worker console and watch the logging to check on the connections. Turn off your WiFi (or unplug Ethernet) to simulate lost connections. The peers should start trying to connect using Fibonacci backoff, often at first then less often. Go back online and the connections should start coming back online immediately.
For finer grained testing and debugging, you can use this proof-of-concept environment: https://github.com/fission-suite/ipfs-connection-poc. The connection strategy was developed there and the implementation should be exactly the same (except not in a service worker, reporting on by default). In the proof-of-concept you can use local IPFS peers, production peers or even combine the to try cases where only some peers aren't available. It also has verbose logging to check backoff times in more detail.
Summary
This PR fixes/implements the following bugs/features
Our current implementation uses keep alive to reconnect every 60 seconds. With these changes, Fibonacci backoff will attempt to reconnect sooner if a connect attempt fails and backoff gradually. Once we reach a threshhold of five minutes, it no longer makes sense to backoff further and we keep trying every five minutes.
In addition, on an
online
event, we attempt to connect immediately.We report connection status individually and in aggregate in the console. For each connection we report:
connected
statuslastConnectedAt
timestamplatency
in millisecondsIn aggregate, we report
offline
status, true when all peers are not connectedlastConnectedAt
, the timestamp of the most recent connection to any peerReporting is on by default on
localhost
and in the staging environment.In production, you can start monitoring connections by running
monitorPeers
in the shared worker console and stop monitoring withstopMonitoringPeers
. The shared worker console is inabout:debugging#/runtime/this-firefox
in Firefox andchrome://inspect/#workers
in Chrome. (Scroll way down in Firefox to Shared Workers.)Test plan (required)
Open the shared worker console and watch the logging to check on the connections. Turn off your WiFi (or unplug Ethernet) to simulate lost connections. The peers should start trying to connect using Fibonacci backoff, often at first then less often. Go back online and the connections should start coming back online immediately.
For finer grained testing and debugging, you can use this proof-of-concept environment: https://github.com/fission-suite/ipfs-connection-poc. The connection strategy was developed there and the implementation should be exactly the same (except not in a service worker, reporting on by default). In the proof-of-concept you can use local IPFS peers, production peers or even combine the to try cases where only some peers aren't available. It also has verbose logging to check backoff times in more detail.