Dabz / ccloudexporter

Prometheus exporter for Confluent Cloud API metric
https://docs.confluent.io/current/cloud/metrics-api.html
87 stars 53 forks source link

Feature request: expose additional endpoints for self health check (liveness and readiness) #86

Closed tringuyen-yw closed 2 years ago

tringuyen-yw commented 3 years ago

Description

As of 2021-09-02, ccloud-exporter exposes only endpoint localhost:2112/metrics. When an HTTP request is made on this /metrics endpoint, ccloud-exporter makes outgoing requests to Confluent Cloud Metrics API. Which is the normal and expected behaviour.

In the context of Kubernetes, when ccloud-exporter runs within a pod with livenessProbe and readinessProbe. As the /metrics is the only endpoint exposed by ccloud-exporter, we might be attempted to use this endpoint to probe the readiness status of the ccloud-exporter container.

As a result, each time the /metrics endpoint is probed, and the probe frequency is high (every 5 seconds in this example). The probe request will trigger a collection of requests to Confluent Cloud Metrics API. The quick repeats of probing on the /metrics endpoint will then exhaust the CCloud Metrics API rate limit of 50 requets / minute.

}
  "Endpoint": "https://api.telemetry.confluent.cloud//v2/metrics/cloud/query",
  "StatusCode": 429,
  "body": "",
  "level": "error",
  "msg": "Received invalid response",
  "time": "2021-09-02T14:36:40Z"
}
{
  "error": "Received status code 429 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ()",
  "level": "error",
  "msg": "Query did not succeed",
  ... etc...
}

In the case of this example, the API rate limit error status 429 occurs within 15 seconds. Then ccloud-exporter is stuck in an infinite loop of "StatusCode": 429. Because Kubernetes will endlessly probe the /metrics endpoint to check the health of the pod.

Suggestion

Add a separate endpoint for self health-check. For example: localhost:2113/selfcheck which returns OK if ccloud-exporter is in good shape. This helps Kubernetes to manage the life cycle of the container. For example, to restart the container if it is stuck in a non-functional state.

To reproduce the "StatusCode": 429

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ccloud-exporter
  namespace: monitoring
  labels:
    app: ccloud-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ccloud-exporter
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: ccloud-exporter
    spec:
      containers:
        - name: ccloud-exporter
          image: dabz/ccloudexporter:latest
          imagePullPolicy: IfNotPresent
          env:
            - name: CCLOUD_API_KEY
              value: CloudAPIKey?????
            - name: CCLOUD_API_SECRET
              value: CloudAPISecret?????
            - name: CCLOUD_CLUSTER
              value: lkc-?????
          ports:
            - name: metrics
              containerPort: 2112
              protocol: TCP
#         livenessProbe:
#           httpGet:
#             path: /metrics
#             port: metrics
#             scheme: HTTP
#           initialDelaySeconds: 30
#           timeoutSeconds: 30
#           periodSeconds: 15
#           successThreshold: 1
#           failureThreshold: 3
#         readinessProbe:
#           httpGet:
#             path: /metrics
#             port: metrics
#             scheme: HTTP
#           initialDelaySeconds: 30
#           timeoutSeconds: 30
#           periodSeconds: 5
#           successThreshold: 1
#           failureThreshold: 3
          resources:
            requests:
              cpu: "250m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: ccloud-exporter-service
  namespace: monitoring
  labels:
    app: ccloud-exporter
spec:
  ports:
    - name: metrics
      protocol: TCP
      port: 2112
      targetPort: 2112
  selector:
    app: ccloud-exporter