GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
191 stars 89 forks source link

secret injection for PodMonitoring in non default namespace not working #1100

Closed danieldethloff1993 closed 1 month ago

danieldethloff1993 commented 1 month ago

Hi, i've found out, that, if you use an other namespace as default for setting up your applications and deploy the PodMonitoring Resource on it, the secret injection for the certification authority is not working.

i got following Error when apply PodMonitoring resource with terraform:

I am using an autopilot cluster.

│ Error: API response status: Failure
│
│   with module.vault-primary.kubernetes_manifest.metrics_monitoring,
│   on ../../modules/vault/monitoring.tf line 41, in resource "kubernetes_manifest" "metrics_monitoring":
│   41: resource "kubernetes_manifest" "metrics_monitoring" {
│
│ admission webhook "validate.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com" denied the request: invalid definition for endpoint with index 0:
│ unable to parse or invalid Prometheus HTTP client config: must use namespace "example", got: "default"

To reproduce, just add you application and PodMonitoring to another namespace then default. I did the following: create a secret with cluster root cert inside:

apiVersion: v1
data:
  tls.ca: <CERT_VALUE>
kind: Secret
metadata:
  creationTimestamp: "2024-08-05T06:46:47Z"
  name: example-ca
  namespace: example
  resourceVersion: "3949589"
  uid: cba337dd-1b63-4b26-89cf-dc5fe8d66617
type: Opaque

add the role to for the secret usage:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: "2024-08-05T08:19:55Z"
  name: example-secret-read
  namespace: example
  resourceVersion: "4032664"
  uid: 30ea529a-7e2f-4e06-8ae6-c4333b443c24
rules:
- apiGroups:
  - ""
  resourceNames:
  - example-ca
  resources:
  - secrets
  verbs:
  - list
  - watch
  - get

and the role binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: "2024-08-05T06:46:47Z"
  name: gmp-system:collector:example-secret-read
  namespace: example
  resourceVersion: "3949586"
  uid: 01154da0-ce60-409f-be94-892f9786c330
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: example-secret-read
subjects:
- kind: ServiceAccount
  name: collector
  namespace: gke-gmp-system

and finally the podMonitoring:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  creationTimestamp: "2024-08-05T08:25:51Z"
  generation: 1
  name: example-metrics-monitoring
  namespace: example
  resourceVersion: "4039321"
  uid: 8b37e398-8833-4873-811d-5b09a97dcdd9
spec:
  endpoints:
  - interval: 30s
    path: /metrics
    port: metrics
    scheme: https
    tls:
      serverName: example-app
      ca:
        secret:
          name: example-ca
          key: tls.ca
          namespace: example
  selector:
    matchLabels:
      app.kubernetes.io/instance: example-app
  targetLabels:
    metadata:
    - pod
    - container

any suggestions how to fix this? If delete the ca part from tls and addinsecureSkipVerify: trueeverything is working. If i deploy the example from this page (https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed) on default namespace its also working. I think the namespace mentioned in the docs: https://github.com/GoogleCloudPlatform/prometheus-engine/blob/main/doc/api.md#monitoring.googleapis.com/v1.SecretKeySelector for SecretKeySelector is not working as expected.

bwplotka commented 1 month ago

Thanks for reporting.

I believe this issue was reported already here https://github.com/GoogleCloudPlatform/prometheus-engine/pull/776#issuecomment-2087606662 (not an issue that's why hard to find) and fixed e.g. https://github.com/GoogleCloudPlatform/prometheus-engine/pull/1007 in the recent versions of GMP (I think from 0.12).

Do you mind upgrading and checking? It was an GMP operator binary bug essentially.

I will close, but if I missed something or maybe it still does not work for you on the newer GMP version, we can reopen!