Open keturiosakys opened 1 year ago
This can be picked up together with https://github.com/autometrics-dev/am/issues/22
Ideally am would be able to infer from the code or at runtime if custom histogram buckets are used and suggest to apply them to the Prometheus config running under the hood.
I think runtime detection is safer, because sometimes you won't be able to infer the buckets from the code (if an environment variable controls the buckets).
Also, technically the alerting ruleset will still work if you only change the buckets of the histogram, it will just be less accurate (meaning the alert might not trigger at exactly 90%
or whatever SLO success rate); what will break completely the alerting rules (as "no alert would trigger at all"), is to change the percentile objective to something unsupported (so something outside 90%
, 95%
, 99%
, and 99.9%
iirc). It's still something we want to tackle for sure anyway!
I would strongly suggest that there needs to be a way to enforce a rule for there to be a bucket boundary on the SLO latency target. The loss of accuracy that @gagbo describes could be a big deal. For example:
SLO target is 95% at <= 150ms A bucket exists for >100ms & <200ms All responses take 110ms The 95% percentile would be calculated as 195ms, and an alert would trigger.
Without enforcing the boundaries to match the SLO target it's easy for people to create a situation where their well-behaved code would permanently trigger an alert.
Some of the Autometrics libraries (Rust, Go) already support customizing the histogram buckets and others are likely to follow suit. One implication of this is that if the library produces metrics with different than default histogram buckets, the Autometrics alerting ruleset won't work. To make it easy to handle these cases
am
could be able to generate and apply these custom rulesets.Ideally
am
would be able to infer from the code or at runtime if custom histogram buckets are used and suggest to apply them to the Prometheus config running under the hood. Alternatively, an easier path to implement would be a flag +am.toml
field that would allow to set these manually