Add /healthz endpoint for health check

takanabe commented 3 years ago

Background

I'm not sure the design of this exporter but GET /healthz or any health check is useful when systems need to check availability of the exporter. So, this PR suggests introducing the health check endpoint in addition to the /metrics endpoint.

peterbourgon commented 3 years ago

In what way would this be different than GET /metrics?

peterbourgon commented 3 years ago

when systems need to check availability of the exporter

Just to say a bit more: Prometheus is usually the only consumer of an exporter, and it already monitors healthiness implicitly via scraping, and makes that information available under the up{job="fastly_exporter"} metric.

takanabe commented 3 years ago

In what way would this be different than GET /metrics?

Size of the response is different.

There are some cases when we want to confirm the health of Prometheus exporters from systems without communicating Prometheus(or Prometheus API).

Let's say we use the consul to decide the health of Prometheus exporters. The raft.db grows rapidly because the DB stores response from the healthcheck reply. (e.g. https://github.com/hashicorp/consul/issues/5181#issuecomment-451281308).

So, it would be nice to have health check endpoints returning response quickly and without huge payloads to support various systems

peterbourgon commented 3 years ago

Size of the response is different.

HEAD /metrics?

takanabe commented 3 years ago

HEAD /metrics?

Yes, if we can specify the HTTP method. But, if software implementation only accepts a request path and the health check let us use GET, there are no ways to use HEAD /metrics on our end. If introduction of the health check endpoint does not make sense for the design of this exporter, I'm fine with the conclusion 😉

peterbourgon commented 3 years ago

If introduction of the health check endpoint does not make sense for the design of this exporter, I'm fine with the conclusion.

So the only thing a Prometheus exporter ever does is host a single HTTP endpoint, /metrics, serving basically plain text, as quickly as possible. If it works, the exporter is healthy, and if it doesn't, the exporter is unhealthy — /metrics is the health check endpoint, a separate /healthz just doesn't make sense. And not only for this exporter, but for any exporter.

The fact that the response body served by GET /metrics can be long shouldn't be an issue, because there's no reason a health checking system should be inspecting anything other than the response code. The fact that Consul both reads and persists the response body is confusing to me, I can't think of any reason it should do that, and I couldn't find any justification when reading the relevant docs and issues. But even then, the response body can be elided by making a HEAD instead of a GET — and any health checking system that can make HTTP requests must support methods other than GET, it would be broken otherwise :)

takanabe commented 3 years ago

Thank you for your comments!

fastly / fastly-exporter

Add /healthz endpoint for health check #61

Background