As of 2021-09-02, ccloud-exporter exposes only endpoint localhost:2112/metrics. When an HTTP request is made on this /metrics endpoint, ccloud-exporter makes outgoing requests to Confluent Cloud Metrics API. Which is the normal and expected behaviour.
In the context of Kubernetes, when ccloud-exporter runs within a pod with livenessProbe and readinessProbe. As the /metrics is the only endpoint exposed by ccloud-exporter, we might be attempted to use this endpoint to probe the readiness status of the ccloud-exporter container.
As a result, each time the /metrics endpoint is probed, and the probe frequency is high (every 5 seconds in this example). The probe request will trigger a collection of requests to Confluent Cloud Metrics API. The quick repeats of probing on the /metrics endpoint will then exhaust the CCloud Metrics API rate limit of 50 requets / minute.
}
"Endpoint": "https://api.telemetry.confluent.cloud//v2/metrics/cloud/query",
"StatusCode": 429,
"body": "",
"level": "error",
"msg": "Received invalid response",
"time": "2021-09-02T14:36:40Z"
}
{
"error": "Received status code 429 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ()",
"level": "error",
"msg": "Query did not succeed",
... etc...
}
In the case of this example, the API rate limit error status 429 occurs within 15 seconds.
Then ccloud-exporter is stuck in an infinite loop of "StatusCode": 429. Because Kubernetes will endlessly probe the /metrics endpoint to check the health of the pod.
Suggestion
Add a separate endpoint for self health-check. For example: localhost:2113/selfcheck which returns OK if ccloud-exporter is in good shape. This helps Kubernetes to manage the life cycle of the container. For example, to restart the container if it is stuck in a non-functional state.
To reproduce the "StatusCode": 429
Uncomment the livenessProbe and readinessProbe sections in the manifest below and deploy it on your Kubernetes cluster.
Configure the value for the environment variables CCLOUD_...
Description
As of 2021-09-02, ccloud-exporter exposes only endpoint
localhost:2112/metrics
. When an HTTP request is made on this/metrics
endpoint, ccloud-exporter makes outgoing requests to Confluent Cloud Metrics API. Which is the normal and expected behaviour.In the context of Kubernetes, when ccloud-exporter runs within a pod with
livenessProbe
andreadinessProbe
. As the/metrics
is the only endpoint exposed by ccloud-exporter, we might be attempted to use this endpoint to probe the readiness status of theccloud-exporter
container.As a result, each time the
/metrics
endpoint is probed, and the probe frequency is high (every 5 seconds in this example). The probe request will trigger a collection of requests to Confluent Cloud Metrics API. The quick repeats of probing on the/metrics
endpoint will then exhaust the CCloud Metrics API rate limit of 50 requets / minute.In the case of this example, the API rate limit error status 429 occurs within 15 seconds. Then ccloud-exporter is stuck in an infinite loop of "StatusCode": 429. Because Kubernetes will endlessly probe the
/metrics
endpoint to check the health of the pod.Suggestion
Add a separate endpoint for self health-check. For example:
localhost:2113/selfcheck
which returns OK ifccloud-exporter
is in good shape. This helps Kubernetes to manage the life cycle of the container. For example, to restart the container if it is stuck in a non-functional state.To reproduce the "StatusCode": 429
livenessProbe
andreadinessProbe
sections in the manifest below and deploy it on your Kubernetes cluster.CCLOUD_...