Closed jmleddy closed 3 days ago
The suspend label has been included in the gotk_resource_info
which is provided by kube-state-metrics. I recommend you migrate your alerts and dashboard to the new metric as the old ones have been deprecated long ago. Docs here: https://fluxcd.io/flux/monitoring/metrics/#resource-metrics
This is extra toil for us to get a back metric that we were alerting on and have completely lost. We have no idea how many helm releases are paused not applying resource request updates or whatever. And it's inconsistently applied. Why do our kustomizations still report when they are stalled but our helm releases don't? I realize that there are different maintainers that have different opinions about what metrics should be exposed, but to the end user this all just looks like "flux", since all the controllers come with flux.
For anyone that might find this PR and wonder what the kube-state-metrics config is, seems to be here
I realize that there are different maintainers that have different opinions
The core maintainers make the decisions for the common behaviour of all Flux controllers and the metrics fall into this category. We made the decision to drop the resource specific metrics from the controllers exporters and rely on kube-prometheus-stack. The deprecation notice can be found here: https://fluxcd.io/flux/monitoring/metrics/#warning-deprecated-resource-metrics
This controller was last promoted to GA, so we removed the deprecated metrics from it, but we should've done that in all controllers. We'll make sure the old metrics are removed in the next release across all Flux components.
Thank you, though I would prefer you add back this metric everywhere to avoid requiring everyone to add 275 lines of yaml to their kube-prometheus-stack helm chart, the inconsistency is even worse than the first decision as it feels uneven. And probably also led us to slower detection of the issue as it was still finding suspends "sometimes". So looking forward to having a consistent view from the Flux controller maintainers here :)
@jmleddy you can read our motivation in this issue: https://github.com/fluxcd/flux2/issues/4128
If you don't like the kube-state-metrics approach feel free to use the Flux Operator, the tradeoff is that you can't customise those metrics in any way.
Okay I thought the controller was running as part of the operator I must not have my kube config right. I'll run it as part of the ksm thanks!
At some point we had this and then we lost it. Discovered after we started suspending a bunch of things but could not get this metric to appear, meaning we are currently in a quasi-state of releases suspended across all our clusters that we don't know about.