cloudflare / pint

Prometheus rule linter/validator
https://cloudflare.github.io/pint/
Apache License 2.0
873 stars 53 forks source link

Add pint check to disallow `topk` or `bottomk` in recording rules #820

Closed wbollock closed 3 weeks ago

wbollock commented 11 months ago

I believe a new check to warn users about the use of topk or bottomk in recording rules would be useful, as those rules will typically churn quite often and create many new time series.

This is fairly opinionated and could vary a lot of on the nature of the metric being aggregated so interested in discussion and definitely think it fits as a non-default rule

wbollock commented 6 months ago

cc: @prymitive let me know what you think. happy to contribute this

prymitive commented 5 months ago

As mentioned in https://github.com/cloudflare/pint/pull/985#issuecomment-2136889028 I don't believe that just throwing warnings every time someone uses topk is a valid approach. Churn in recording rules might be undesired but there's nothing fundamentally wrong with it. With alerting rules it's different, I agree that if someone has a rule like:

- alert: ...
  expr: topk(10, foo)

then it is likely to cause flapping alerts, since topk() might return different time series on each evaluation. But one can stabilise the results by stripping churning labels, for example by stripping all labels and reporting min or max value: min(topk(10, foo)). So this would need a bit more advanced logic to find only queries that "leak" labels from topk() into final results with no constrains and don't use for: or keep_firing_for: to avoid churn.

prymitive commented 3 weeks ago

I need a function that gives me the source of labels in a query and this check is a good use case for testing it - so expect it to be added in the next release.