fission-codes / auth-lobby

The authentication service that Fission services run.
https://auth.fission.codes
GNU Affero General Public License v3.0
12 stars 1 forks source link

Update peer connection strategy #94

Closed bgins closed 3 years ago

bgins commented 3 years ago

Summary

This PR fixes/implements the following bugs/features

Our current implementation uses keep alive to reconnect every 60 seconds. With these changes, Fibonacci backoff will attempt to reconnect sooner if a connect attempt fails and backoff gradually. Once we reach a threshhold of five minutes, it no longer makes sense to backoff further and we keep trying every five minutes.

In addition, on an online event, we attempt to connect immediately.

We report connection status individually and in aggregate in the console. For each connection we report:

In aggregate, we report

Reporting is on by default on localhost and in the staging environment.

In production, you can start monitoring connections by running monitorPeers in the shared worker console and stop monitoring with stopMonitoringPeers. The shared worker console is in about:debugging#/runtime/this-firefox in Firefox and chrome://inspect/#workers in Chrome. (Scroll way down in Firefox to Shared Workers.)

Test plan (required)

Open the shared worker console and watch the logging to check on the connections. Turn off your WiFi (or unplug Ethernet) to simulate lost connections. The peers should start trying to connect using Fibonacci backoff, often at first then less often. Go back online and the connections should start coming back online immediately.

For finer grained testing and debugging, you can use this proof-of-concept environment: https://github.com/fission-suite/ipfs-connection-poc. The connection strategy was developed there and the implementation should be exactly the same (except not in a service worker, reporting on by default). In the proof-of-concept you can use local IPFS peers, production peers or even combine the to try cases where only some peers aren't available. It also has verbose logging to check backoff times in more detail.