AntelopeIO / spring

C++ implementation of the Antelope protocol with Savanna consensus
Other
4 stars 1 forks source link

Finality Voting Threshold metrics for alerting #323

Closed bhazzard closed 2 weeks ago

bhazzard commented 2 months ago

Observability and alerting will be important to allow production network operators to remediate when voting falls below a threshold that will allow finality to advance.

To this end, Node Operators need a way to configure alerts that fire when finality voting falls below a configurable threshold.

Suggested Metrics:

This issue is related to https://github.com/AntelopeIO/spring/issues/227

bhazzard commented 2 months ago

Potentially also: per finalizer, total delta time between block time and vote time.

This would give a picture of how long it takes for each finalizer to vote after a block is produced.

bhazzard commented 2 months ago

Decision needed: should we add this to Prometheus or Debug logs, or both?

arhag commented 2 weeks ago

The logs already provide the above information and more.

Moreover, we now have a new endpoint (see https://github.com/AntelopeIO/spring/pull/453) that provides useful information about the last votes by each finalizer which can more easily be used to build alerting tools to alert if some finalizers are failing to participate.

So I think we can close this issue.