elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.81k stars 8.2k forks source link

[SLO] Burn rate panel improvement #195139

Open kdelemme opened 2 weeks ago

kdelemme commented 2 weeks ago

πŸ’ Summary

The Burn Rate panel can be confusing for customers. For example, in the screenshot below, there is a clear misalignment between the overall SLO Status (Violated) and the Burn Rate status (acceptable value). That's because the burn rate is computed assuming a 100% error budget remaining.

Image

Datadog blog: https://www.datadoghq.com/blog/burn-rate-is-better-error-rate/

Suggested change

Fetch the burn rates related to the long and short windows defined by the first SLO burn rate rule, or fallback to the default windows values:(Long, Short) : (1h, 5m), (6h, 30m), (24h, 2h), (72h, 6h)

Compare LONG (L) vs SHORT (S) window burn rate against Threshold (T):

Do we want to include a Tooltip? e.g. β†’ Assuming a full error budget, a 3x constant burn rate means the error budget will be exhausted in 10 days.

Color = danger when L > T and S > T, warning when L < T or S < T, success when L ≀ T and S ≀ T

Example
Im age

🎯 Acceptance criteria

elasticmachine commented 2 weeks ago

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)