andydunstall / piko

An open-source alternative to Ngrok, designed to serve production traffic and be simple to host (particularly on Kubernetes)
MIT License
1.9k stars 55 forks source link

Gossip Latency Monitoring #174

Open andydunstall opened 3 days ago

andydunstall commented 3 days ago

Add support for gossip entry timestamps which are set when an entry is created and propagated to the rest of the cluster. Nodes can then use these timestamps to calculate how long it takes for entries to be propagated around the cluster.

Those times can then be exposed as metrics, then used to configure gossip or detect when the cluster is overloaded

Versioning

Adding a new field will require a new gossip protocol version, so nodes must support both the existing version (0) and the new version (1).

Evaluation

Adding these metrics can also be used to evaluate the scaling limits of gossip. Such as extend piko test workload upstreams to support to add --churn flags indicating how often each upstream should reconnect.

That can then be used to understand how much churn a cluster with default gossip settings can support before latency exceeds some threshold (say 10 seconds).

(Can also proxy each gossip node to inject latency and dropped messages)