GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
195 stars 93 forks source link

Support of Job and CronJob monitoring #987

Closed AndrasSandor closed 5 months ago

AndrasSandor commented 5 months ago

Currently kube job metrics, such as kube_job_status_failed or kube_job_status_succeeded are not made available for monitoring. List of metrics: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/job-metrics.md

lyanco commented 5 months ago

You can manually deploy kube-state-metrics and scrape these metrics. Instructions here: https://cloud.google.com/stackdriver/docs/managed-prometheus/exporters/kube_state_metrics

We had to limit the number of kube-state metrics we collect by default so that costs are minimal. That being said, if there are enough +1s on this, we can definitely add a few more metrics, especially if they are not high-volume metrics.

maxamins commented 5 months ago

@AndrasSandor let us know if @lyanco suggestion works for you. Closing this issue for now.

Future readers feel free to +1 or reopen this thread if there is demand for this feature.

ksoftirqd commented 4 months ago

It would be great to add jobs/cronjobs related metrics to the list. The volume would likely be insignificant. However, I assume, it would need to be explicitly enabled to impact costs.

The alternative options are not appealing:

pintohutch commented 4 months ago

Hey @ksoftirqd - thanks for reaching out.

Deploying self-managed kube-state-metrics instead of the managed one does not offer the ability to “honor” reserved labels, like namespace/pod etc, leading to a confusing set of labels (e.g. namespace/exported_namespace, etc) and the need to modify existing rules.

Actually by specifying a ClusterPodMonitoring like we show in examples/, you should have those labels honored by the kube-state-metrics exporter. Have you tried this?

ksoftirqd commented 4 months ago

Hi @pintohutch,

Thank you, this is very helpful! I must have missed this example and it indeed works as expected.

If we cannot add jobs/cronjobs to the list of supported resources, this can be a great alternative to the managed kube-state-metrics exporter.

Still it would be great to see those resources added, as the current solution seems to support the majority of resources and it's possible to toggle their metrics collection in the cluster config.