giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Ensure alerting and recording rules are correctly evaluated by the mimir ruler #3157

Closed QuentinBisson closed 7 months ago

QuentinBisson commented 9 months ago

Towards https://github.com/giantswarm/roadmap/issues/3039 Let's validate our current recording and alerting rules on mimir.

Depends on https://github.com/giantswarm/roadmap/issues/3127

QuantumEnigmaa commented 8 months ago

Logs from mimir-ruler on golem :

ts=2024-02-27T14:41:47.305278335Z caller=ruler.go:564 level=info msg="syncing rules" reason=periodic

Seems to be ok according to the logs, but as mentioned here, queries in grafana using mimir recording rules (whether using the prometheus or the mimir datasource) are not working (i.e empty query result, "no data")

QuentinBisson commented 8 months ago

Reviewed alerts from vpa to operator-kit

Here is a few comments to not forget:

Atlas:

Needs fixing:

QuentinBisson commented 8 months ago

Alerting rules:

Recording rules

Plus Sloth rules

QuentinBisson commented 7 months ago

@giantswarm/team-atlas I think I will close this issue as the main things have been fixed.

What is left is to fix on the remaining PRs are merged is:

My idea is to create a migration issue that would reference one issue per team (us included) to review their alerts, test the alert expressions work on golem with a MC and a WC deployed and also to make sure all their apps are using service monitors.

What do you think about this?

I will write a draft for the issue description on thursday and ask for your feedback

QuentinBisson commented 7 months ago

Extra fixes https://github.com/giantswarm/prometheus-rules/pull/1060

QuentinBisson commented 7 months ago

Migration tracking issue https://github.com/giantswarm/roadmap/issues/3312

QuentinBisson commented 7 months ago

The rule fixes are getting released https://github.com/giantswarm/prometheus-rules/pull/1063. I consider this isuse closed