Different alert recovery threshold

SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Other

18.44k stars 1.17k forks source link

Is your feature request related to a problem?

A lot of times, an alert will rapidly toggle between alerting and being resolved. For example, the disk space usage here is marginal, and kept firing on & off:

Describe the solution you'd like

A separate recovery threshold would add some hysteresis to the alert and keep it active until the underlying problem is solved. This also happens to be the same technique Datadog uses.

Describe alternatives you've considered

Requiring an alert be alerting for a specific amount of time is not enough in this case, since this alert already had a time period of 1hr set.

Additional context

Add any other context or screenshots about the feature request here.

SigNoz / signoz