Closed takanabe closed 3 years ago
In what way would this be different than GET /metrics
?
when systems need to check availability of the exporter
Just to say a bit more: Prometheus is usually the only consumer of an exporter, and it already monitors healthiness implicitly via scraping, and makes that information available under the up{job="fastly_exporter"}
metric.
In what way would this be different than GET /metrics?
Size of the response is different.
There are some cases when we want to confirm the health of Prometheus exporters from systems without communicating Prometheus(or Prometheus API).
Let's say we use the consul to decide the health of Prometheus exporters. The raft.db
grows rapidly because the DB stores response from the healthcheck reply. (e.g. https://github.com/hashicorp/consul/issues/5181#issuecomment-451281308).
So, it would be nice to have health check endpoints returning response quickly and without huge payloads to support various systems
Size of the response is different.
HEAD /metrics?
HEAD /metrics?
Yes, if we can specify the HTTP method. But, if software implementation only accepts a request path and the health check let us use GET
, there are no ways to use HEAD /metrics
on our end. If introduction of the health check endpoint does not make sense for the design of this exporter, I'm fine with the conclusion 😉
If introduction of the health check endpoint does not make sense for the design of this exporter, I'm fine with the conclusion.
So the only thing a Prometheus exporter ever does is host a single HTTP endpoint, /metrics, serving basically plain text, as quickly as possible. If it works, the exporter is healthy, and if it doesn't, the exporter is unhealthy — /metrics is the health check endpoint, a separate /healthz just doesn't make sense. And not only for this exporter, but for any exporter.
The fact that the response body served by GET /metrics can be long shouldn't be an issue, because there's no reason a health checking system should be inspecting anything other than the response code. The fact that Consul both reads and persists the response body is confusing to me, I can't think of any reason it should do that, and I couldn't find any justification when reading the relevant docs and issues. But even then, the response body can be elided by making a HEAD instead of a GET — and any health checking system that can make HTTP requests must support methods other than GET, it would be broken otherwise :)
Thank you for your comments!
Background
I'm not sure the design of this exporter but
GET /healthz
or any health check is useful when systems need to check availability of the exporter. So, this PR suggests introducing the health check endpoint in addition to the/metrics
endpoint.