GoogleCloudPlatform / k8s-stackdriver

Apache License 2.0
391 stars 212 forks source link

bug: custom-metrics-stackdriver-adapter requires Cloud Monitoring metric descriptor to be in both host and multi-tenant project #396

Closed davidxia closed 3 years ago

davidxia commented 3 years ago

I'm using custom-metrics-stackdriver-adapter (CMSA) in the form of the published image gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.12.0-gke.0. If I create a HorizontalPodAutoscaler (HPA) in a GKE cluster in project "multi-tenant-project" like below, I noticed I must have the MetricDescriptor in both projects for the HPA to work. I.e. the custom.googleapis.com|debug MetricDescriptor must be in both multi-tenant-project and host-project.

Can someone confirm this bug? Seems like the expected behavior is that the HPA should work without the MetricDescriptor being in multi-tenant-project, i.e. multi-tenant-project doesn't need to have any metric metadata or data related to custom.googleapis.com|debug at all.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: test2
spec:
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - external:
      metricName: custom.googleapis.com|debug
      metricSelector:
        matchLabels:
          resource.labels.project_id: host-project
          resource.labels.location: us-central1
          resource.labels.namespace: default
          resource.labels.node_id: test2
      targetAverageValue: 1
    type: External
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test2
larkintuckerllc commented 3 years ago

First want to acknowledge that it was I who originally brought this up to David's attention and indeed we did experience this problem. The problem with our assessment was that we were using an older [1] version of custom-metrics-stackdriver-adapter where this is indeed a bug.

We did look at the code of latest version of custom-metrics-stackdriver-adapter and misinterpreted what was going on; based on the comments and a rough look at the code, we made the mistaken assumption that the problem continued to exist in the latest version.

It is when we ran the latest version of the code in a debugging environment did we observe that the new code did actually have provisions to handle multiple project ids. We also confirmed that the latest version of the code does indeed properly handle the situation where the custom metric is in a separate project than the project the GKE cluster is running on.

Tracing back the change, we indeed find that there is a commit [2] where this specific bug is called out and addressed.

[1] https://github.com/GoogleCloudPlatform/k8s-stackdriver/commit/da8af4541f7305bec347131e78edbe6f967f3a6a#diff-1cd5577a38c836be2036fb43e86b8176

[2] https://github.com/GoogleCloudPlatform/k8s-stackdriver/commit/508f3ad2233ed41ca8041ddf8deadd4deb47aa7c