The Burn Rate panel can be confusing for customers.
For example, in the screenshot below, there is a clear misalignment between the overall SLO Status (Violated) and the Burn Rate status (acceptable value). That's because the burn rate is computed assuming a 100% error budget remaining.
Fetch the burn rates related to the long and short windows defined by the first SLO burn rate rule, or fallback to the default windows values:(Long, Short) : (1h, 5m), (6h, 30m), (24h, 2h), (72h, 6h)
Compare LONG (L) vs SHORT (S) window burn rate against Threshold (T):
if L > T and S > T β Critical: the 1h burn rate is 22x and the 5min burn rate is 16x. Threshold is 14.4x
if L > T but S < T β Recovering: The 1h burn rate is 21.23x and the 5min burn rate is 10.2x. Threshold is 14.4x
if L < T but S > T β Increasing: The 1h burn rate is 12x and the 5min burn rate is 15x. Threshold is 14.4x
if L β€ T and S β€ T β Acceptable: The 1h burn rate is 5x and the 5min burn rate is 7x. Threshold is 14.4x
Do we want to include a Tooltip? e.g. β Assuming a full error budget, a 3x constant burn rate means the error budget will be exhausted in 10 days.
Color = danger when L > T and S > T, warning when L < T or S < T, success when L β€ T and S β€ T
Example
π― Acceptance criteria
There is no risk of error budget exhaustion. should be removed.
π Summary
The Burn Rate panel can be confusing for customers. For example, in the screenshot below, there is a clear misalignment between the overall SLO Status (Violated) and the Burn Rate status (acceptable value). That's because the burn rate is computed assuming a 100% error budget remaining.
Datadog blog: https://www.datadoghq.com/blog/burn-rate-is-better-error-rate/
Suggested change
Fetch the burn rates related to the long and short windows defined by the first SLO burn rate rule, or fallback to the default windows values:
(Long, Short) : (1h, 5m), (6h, 30m), (24h, 2h), (72h, 6h)
Compare LONG (L) vs SHORT (S) window burn rate against Threshold (T):
Do we want to include a Tooltip? e.g. β Assuming a full error budget, a 3x constant burn rate means the error budget will be exhausted in 10 days.
Color = danger when L > T and S > T, warning when L < T or S < T, success when L β€ T and S β€ T
π― Acceptance criteria
There is no risk of error budget exhaustion.
should be removed.