cert-manager / cert-manager

Automatically provision and manage TLS certificates in Kubernetes
https://cert-manager.io
Apache License 2.0
12.21k stars 2.1k forks source link

Job label in Helm chart of cert-manager shouldn't be templated from chart name #7088

Open atillaqb opened 5 months ago

atillaqb commented 5 months ago

Is your feature request related to a problem? Please describe. Helm chart templating Service Monitor's job name using helm chart release name. When you are deploying helm chart using Argo CD - Argo CD inherit the helm release name from ArgoApps application name. For example: I've 3 different clusters with names: test/dev/prod. With Argo CD Application Set I've created an application which deploys cert-manager helm chart in 3 different clusters. Application Set is a form of templating to reduce repeatable code (https://argo-cd.readthedocs.io/en/stable/user-guide/application-set/) In an application set template I'm using a combination of cluster name (test/dev/prod) and application name (cert-manager). Which create ArgoApp with the name of dev-cert-manager, test-cert-manager, prod-cert-manager. Since Argo CD using ArgoApp name for naming helm release - it will lead that the helm release name will be inherit. (dev-cert-manager, test-cert-manager, prod-cert-manager). Since Service Monitor job name uses templating from release name - the final job name will be inherited. In the end when you will send metrics scraped from this Service Monitor - you will get Prometheus label job name with dev-cert-manager, test-cert-manager, prod-cert-manager. But the best practice - job name should be the same, no matter in which environment you application is running. Job name label represent the application itself. (Like cert-manager, loki, tempo, mimir, etc). And in the final - when you try to create dashboard for cert manager you will need to point to 3 different job names, instead of using another common label (like cluster).

Describe the solution you'd like Instead of using:

spec:
  jobLabel: {{ template "cert-manager.fullname" . }}

Create constant job label name - cert-manager:

spec:
  jobLabel: cert-manager

Additional context Argo CD application sets.

Environment details (remove if not applicable):

/kind feature

inteon commented 5 months ago

Could you share the docs that describe this best-practice?

atillaqb commented 5 months ago

Could you share the docs that describe this best-practice?

Hello! I take a notice that job name in a monitoring system (prometheus, loki) - most of the time (99 percent) of all dashboards, examples, docs - represent the name of the application / service - that should be scrapped (job_name=cert-manager). If you have more than 1 copy of application/service - you add another label to time series - for example instance (job_name=cert-manager, instance=10.9.8.1). If you have different environments - like 3 different clusters that running cert-manager - you can add additional label - "cluster". (job_name=cert-manager, instance=10.9.8.1, cluster = dev). In this example:

  1. job_name=cert-manager - static label which never changes across different environments.
  2. instance - can be changed because the pod ip address changes every time that you restart/create pod.
  3. cluster - can be changed because it represents the name of the environment.

    But because the job name form from release name of helm chart, and because ArgoCD generate release name from Argo application name - this job name label is not static. Why it's important - with dynamic "job name" label whenever I create a dashboard for metrics from cert-manager - I need to choose some label that represent your application across all environments. Otherwise, I will create 3 copies of 1 dashboard with equal metrics but with different labels. You can say that I can use variables in Grafana - and you will right, but what label I should choose as anchor - which represent application?

https://grafana.com/docs/loki/latest/get-started/labels/bp-labels/ - best practices for labels in loki, but the same logic behind applies to the prometheus labels as well (because they have equal architecture).

erikgb commented 4 months ago

/priority backlog /area deploy

inteon commented 4 months ago

FYI @wallrj

wallrj commented 4 months ago

@atillaqb What you wrote in https://github.com/cert-manager/cert-manager/issues/7088#issuecomment-2167346999 makes sense, but the documentation for PodMonitor and ServiceMonitor doesn't say anything about this best practice. Nor does it suggest using a static value. In fact the documentation says that the value should be the name of a label on the target Pod or Service.

jobLabel

The label to use to retrieve the job name from. jobLabel selects the label from the associated Kubernetes Pod object which will be used as the job label for all metrics.

For example if jobLabel is set to foo and the Kubernetes Pod object is labeled with foo: bar, then Prometheus adds the job="bar" label to all ingested metrics.

If the value of this field is empty, the job label of the metrics defaults to the namespace and name of the PodMonitor object (e.g. /).

So should it be something like jobLabel: app.kubernetes.io/name?

cert-manager-bot commented 1 month ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

wallrj commented 1 month ago

/reopen

cert-manager-bot commented 2 weeks ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. /lifecycle rotten /remove-lifecycle stale

inteon commented 2 weeks ago

/remove-lifecycle rotten