amazon-archives / k8s-cloudwatch-adapter

An implementation of Kubernetes Custom Metrics API for Amazon CloudWatch
Apache License 2.0
158 stars 98 forks source link

Doesn't work with multi-dimensions metrics #11

Closed hposca closed 4 years ago

hposca commented 5 years ago

Hi there,

On CloudWatch we had a metric named queuedepth with dimensions env, app and queue on the namespace Sidekiq. env described the environment like staging, production and development. app have the application name. And queue the queue from which this data came from. We have a Lambda that we use to gather the data and send it to CloudWatch.

If we try to use this metric as an ExternalMetric, as in the example below, it doesn't work.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "Sidekiq"
          metricName: "queuedepth"
          dimensions:
            - name: env
              value: staging
            - name: app
              value: appname
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
        selector:
          matchLabels:
            env: staging
            app: appname
            queue: queuename
      target:
        type: Value
        value: 40

If we kubectl logs -f the cloudwatch adapter pod we can see that it cannot find the metric :/

To make it work, we had to change our Lambda to create another metric (depth) with a single dimension (queue).

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: queue-depth
spec:
  name: queue-depth
  resource:
    resource: "deployment"
  queries:
    - id: queue_depth
      metricStat:
        metric:
          namespace: "StagingSidekiq"
          metricName: "depth"
          dimensions:
            - name: queue
              value: queuename
        period: 60
        stat: Average
        unit: Count
      returnData: true
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa
spec:
  minReplicas: 1
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: appname
  metrics:
  - type: External
    external:
      metric:
        name: queue-depth
      target:
        type: AverageValue
        averageValue: 40

And, as soon as we applied this new configuration, the metrics were fetched and the HPA began scaling immediately.

Is this expected? As we had dimensions in plural and accepting a list, we thought that we could use multi-dimension metrics. Also, we realized that in all the examples only single-dimension metrics are being used.

Our cluster is on EKS 1.14 and using chankh/k8s-cloudwatch-adapter:v0.6.0.

Thanks

willianantunes commented 5 years ago

Just to endorse it, I get the same result here. My ExternalMetric:

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: aws-mq-propileu-destination-externalmetric
  namespace: production
spec:
  name: aws-mq-propileu-destination-externalmetric
  resource:
    resource: deployment
  queries:
    - id: "mq_1_propileu_destination_length"
      metricStat:
        metric:
          dimensions:
            - name: "Broker"
              value: "jsm-amq-prd2-1"
            - name: "Queue"
              value: "propileu-destination"
          metricName: "QueueSize"
          namespace: "AWS/AmazonMQ"
        period: 60
        stat: Sum
        unit: Count
      returnData: true
    - id: "mq_2_propileu_destination_length"
      metricStat:
        metric:
          dimensions:
            - name: "Broker"
              value: "jsm-amq-prd2-2"
            - name: "Queue"
              value: "propileu-destination"
          metricName: "QueueSize"
          namespace: "AWS/AmazonMQ"
        period: 60
        stat: Sum
        unit: Count
      returnData: true

And the HPA:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: propileu-pubsub-consumer-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: propileu-from-tower-consumer-deployment
  minReplicas: 1
  maxReplicas: 100
  metrics:
    - type: External
      external:
        metricName: aws-mq-propileu-destination-externalmetric
        targetValue: 1

When I do kubectl -n custom-metrics logs -f --tail=100 k8s-cloudwatch-adapter-79cbf445b-vzslb, it outputs no error at all.

If I configured a setup with SQS, like the sample usage, it works properly.

chankh commented 5 years ago

Pull requests are welcomed.

arunbhagyanath commented 4 years ago

@chankh Can you take look at the PR.

rahulttn commented 4 years ago

@chankh Tested .7.0 Didn't work *** Please dont mind the formating , apiVersion: metrics.aws/v1alpha1 kind: ExternalMetric metadata: name: rest-api-cpu spec: name: rest-api-cpu resource: resource: "deployment" queries:

The HPA kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: rest-api-cpu spec: scaleTargetRef: apiVersion: apps/v1beta1 kind: Deployment name: rest-api minReplicas: 1 maxReplicas: 3 metrics:

it shows this rest-api-cpu Deployment/rest-api 0/4 1 3 1 14m

arunbhagyanath commented 4 years ago

@rahulttn While testing I see CloudWatch is not responding to the API calls and checking the metrics stats it was giving unit percent

Metrics Statistics API

aws cloudwatch get-metric-statistics --metric-name pod_cpu_utilization --start-time 09:40:00 --end-time 09:45:00  --period 300 --namespace ContainerInsights --statistics Sum --dimensions Name=PodName,Value=httpd Name=ClusterName,Value=eks Name=Namespace,Value=default

Can you try using the unit as "Percent" instead of "Count" (unit: Count) or remove it entirely (GetMetricStatistics API will be used to get the Unit).

Below are my test files

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: httpd-cpu
spec:
  name: httpd-cpu
  resource:
    resource: "deployment"
  queries:
    - id: httpdcpu
      metricStat:
        metric:
          namespace: "ContainerInsights"
          metricName: "pod_cpu_utilization"
          dimensions:
            - name: PodName
              value: "httpd"
            - name: ClusterName
              value: "eks"
            - name: Namespace
              value: "default"
        period: 10
        stat: Sum
        unit: Percent
      returnData: true
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: httpd-cpu
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: httpd
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: External
    external:
      metricName: httpd-cpu
      targetValue: 10

kubectl get hpa/httpd-cpu -w

NAME        REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
httpd-cpu   Deployment/httpd   0/10      1         3         3          4m1s
httpd-cpu   Deployment/httpd   20/10     1         3         3          5m21s
httpd-cpu   Deployment/httpd   30/10     1         3         3          5m52s
httpd-cpu   Deployment/httpd   19/10     1         3         3          6m23s
rahulttn commented 4 years ago

@arunbhagyanath yeah, removing the unit makes it work, value seems to be percent. this works for cpu , memory etc but ContainerInsights provides pod based metrics like network tx or service based, those metrics would be useful as count imo . If the value could be get as count, that would be great.

willianantunes commented 4 years ago

Now it's working 100% as expected for me! Thank you the one who applied #14!