canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
144 stars 54 forks source link

Add a prometheus-compatible metrics endpoint format to pebble check URL #118

Open mthaddon opened 2 years ago

mthaddon commented 2 years ago

As documented in https://juju.is/docs/sdk/pebble#heading--check-health-endpoint-and-probes:

""" As of Juju version 2.9.26, Pebble includes an HTTP /v1/health endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string ?level=<level>. This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise. """

It would be great if there was an option to expose this in a format that prometheus could ingest. If so, we could easily integrate charms with COS by configuring them to hit this pebble endpoint and ingest the data exposed there.

benhoyt commented 2 years ago

@mthaddon I don't know much (anything :-) about Prometheus endpoint formats: can you briefly describe what it would look like for this endpoint to be Prometheus compatible?

jnsgruk commented 2 years ago

This feels like something @simskij might have opinions on...

simskij commented 2 years ago

This would be really useful. I've not had a look at what is already there yet, so I'll refrain from having an opinion on the current implementation for now.

However, the Prometheus Exposition Format is a very basic and extremely stable specification. In its simplest form, exposed metric consists of a metric name, optionally a set of labels and then a numeric value. The parts are separated eith a single space and entries are separated with a newline.

The spec for the exposition format is available here.

A basic one could look like:

pebble_status{instance="something", check="some-check"} 1

Where 1 is a boolean value

benhoyt commented 2 years ago

Okay, so we'd expose an HTTP endpoint, /v1/metrics or similar, with some metrics about health checks (and potentially other Pebble things) in that format. One consideration is security: we intentionally reduced the /v1/health endpoint down to just the "healthy: true" result to avoid exposing any additional information over this unauthenticated HTTP endpoint. Specifically, we wanted to avoid exposing the details of all the health checks. Any thoughts about whether that's a real issue?

simskij commented 2 years ago

Any thoughts about whether that's a real issue?

Yeah, this would open up for potential fingerprinting of the services. This could be solved by adding basic auth to the metrics endpoint and allowing the charm to access these credentials somehow. We currently lack the support for scrape job authentication in the Prometheus charm (or at least it hasn't been thoroughly tested), but that would be fairly trivial to support.

One considering is security: we intentionally reduced the /v1/health endpoint down to just the "healthy: true"

I wouldn't alter it. Adding /v1/metrics sounds like a better game plan in that case as it closer matches the data it would expose.

shipperizer commented 1 year ago

guys have you got any updates on this issue? any help that can be given on this?

our use case is that we wanna monitor the status of our app(s) across restarts

Happy to lend a hand on it

benhoyt commented 1 year ago

I think given that Pebble is used across a bunch of projects now, we'd need a spec for this (and auth sounds like it might take a bit of thinking about). I like the idea, but don't have bandwidth right this cycle to spec and implement this. @shipperizer Would you be willing to create a spec for it? Proposal for /v1/metrics HTTP endpoint, what's it returns, discussion of auth, and so on.

shipperizer commented 1 year ago

@benhoyt can give it a crack, hopefully

a couple of considerations are:

if those points above are good enough I will cut some time to work on it next pulse

benhoyt commented 1 year ago

If basic auth is what Prometheus supports, then yeah, we'll need to use that (the tricky design thing will be how to configure/get credentials).

I'd be thinking something more along the lines of pebble_service{name="srv1"} 1 and pebble_check{name="chk1"} 0 would be more like it. But I'm not very familiar with how these types of things are done in Prometheus metrics endpoints normally. @simskij will have ideas (he's posted one above too).

Given the design considerations above, I still think we need a formal spec and spec review meeting with a few invested players (Simon, Gustavo, me at least). So probably don't start on implementation until you have a spec semi-approved.

shipperizer commented 1 year ago

@benhoyt agreed, will try to get the specs in the next pulse

shipperizer commented 1 year ago

specs shared internally

benhoyt commented 6 months ago

@shipperizer I forget -- where did we get to with this?

shipperizer commented 4 months ago

@shipperizer anywhere unfortunately, i lost track of it, spec is somewhere in my drive, can dig it out and we can have another chat about it