amazon-archives / k8s-cloudwatch-adapter

An implementation of Kubernetes Custom Metrics API for Amazon CloudWatch
Apache License 2.0
158 stars 97 forks source link

CurrentAverageValue isn't an integer & CurrentValue is 0 #35

Open mcblair opened 4 years ago

mcblair commented 4 years ago

The issue we are experiencing is that CW adapter is able to read from Cloudwatch(it appears, no auth errors anywhere) but we are getting currentValue of 0 and a currentAverageValue way too large and alphanumeric like 18856m.

It is using IAM service accounts for EKS.

HPA live annotations:

autoscaling.alpha.kubernetes.io/current-metrics: >-[{"type":"External","external":{"metricName":"REPLACE-queue-length","currentValue":"0","currentAverageValue":"18556m"}}]
autoscaling.alpha.kubernetes.io/metrics: >-[{"type":"External","external":{"metricName":"REPLACE-queue-length","targetAverageValue":"40"}}]

Here are my yaml definitions:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: REPLACE
  labels:
    version: REPLACE
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: REPLACE
  minReplicas: 2
  maxReplicas: 1024
  metrics:
  - type: External
    external:
      metricName: REPLACE-queue-length
      targetAverageValue: 40
---
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: REPLACE-queue-length
spec:
  name: REPLACE-queue-length
  resource:
    resource: "deployment"
  queries:
  - id: sqs_REPLACE_files
    metricStat:
      metric:
        namespace: "AWS/SQS"
        metricName: "ApproximateNumberOfMessagesVisible"
        dimensions:
          - name: QueueName
            value: REPLACE
      period: 60
      stat: Average
      unit: Count
    returnData: true

Here is cloudwatch adapter manifest:

---
apiVersion: v1
kind: Namespace
metadata:
  name: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: k8s-cloudwatch-adapter-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: k8s-cloudwatch-adapter
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: k8s-cloudwatch-adapter
  template:
    metadata:
      labels:
        app: k8s-cloudwatch-adapter
      name: k8s-cloudwatch-adapter
    spec:
      securityContext:
        fsGroup: 65534
      serviceAccountName: k8s-cloudwatch-adapter
      containers:
      - name: k8s-cloudwatch-adapter
        env:
        - name: AWS_DEFAULT_REGION
          value: REPLACE
        image: chankh/k8s-cloudwatch-adapter:v0.8.0
        imagePullPolicy: "Always"
        args:
        - /adapter
        - --cert-dir=/tmp
        - --secure-port=6443
        - --logtostderr=true
        - --v=10
        ports:
        - containerPort: 6443
          name: https
        - containerPort: 8080
          name: http
        volumeMounts:
        - mountPath: /tmp
          name: temp-vol
      volumes:
      - name: temp-vol
        emptyDir: {}
      - name: token-vol
        projected:
          sources:
          - serviceAccountToken:
              path: token
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter-resource-reader
subjects:
- kind: ServiceAccount
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
---
apiVersion: v1
kind: Service
metadata:
  name: k8s-cloudwatch-adapter
  namespace: custom-metrics
spec:
  ports:
  - name: https
    port: 443
    targetPort: 6443
  - name: http
    port: 80
    targetPort: 8080
  selector:
    app: k8s-cloudwatch-adapter
---
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  service:
    name: k8s-cloudwatch-adapter
    namespace: custom-metrics
  group: external.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter:external-metrics-reader
rules:
- apiGroups:
  - external.metrics.k8s.io
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  - services
  verbs:
  - get
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:external-metrics-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter:external-metrics-reader
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: externalmetrics.metrics.aws
spec:
  group: metrics.aws
  version: v1alpha1
  names:
    kind: ExternalMetric
    plural: externalmetrics
    singular: externalmetric
  scope: Namespaced
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8s-cloudwatch-adapter:crd-metrics-reader
  labels:
    app: k8s-cloudwatch-adapter
rules:
- apiGroups:
  - metrics.aws
  resources:
  - "externalmetrics"
  verbs:
  - list
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: k8s-cloudwatch-adapter:crd-metrics-reader
  labels:
    app: k8s-cloudwatch-adapter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: k8s-cloudwatch-adapter:crd-metrics-reader
subjects:
  - name: k8s-cloudwatch-adapter
    namespace: "custom-metrics"
    kind: ServiceAccount

Service account definition with EKSCTL:

    - metadata:
        name: k8s-cloudwatch-adapter
        namespace: custom-metrics
        labels: {aws-usage: "cluster-ops"}
      attachPolicy:
        Version: "2012-10-17"
        Statement:
        - Effect: Allow
          Action:
          - "cloudwatch:GetMetricData"
          - "cloudwatch:GetMetricStatistics"
          - "cloudwatch:ListMetrics"
          Resource: '*'
mcblair commented 4 years ago

I would like to add that the HPA is scheduling more replicas and is scaling up, but it stops scaling and actually scales down - even when pods are processing queue items and those items are in flight. This ends up causing cluster auto scaler to scale-in, ungracefully terminating the pods - leaving items in flight.

chankh commented 4 years ago

You can refer to the HPA docs for details about how scaling work. It looks like you have a lot of messages in your SQS queue and that's why HPA is scheduling more replicas.

Your application need to be able to handle SIGTERM signal so you can use PreStop hook to perform action before your application pod being terminated, i.e. stop consuming messages, handle in-flight messages, etc. For more information, please check out container lifecycle