Open mthaddon opened 2 years ago
@mthaddon I don't know much (anything :-) about Prometheus endpoint formats: can you briefly describe what it would look like for this endpoint to be Prometheus compatible?
This feels like something @simskij might have opinions on...
This would be really useful. I've not had a look at what is already there yet, so I'll refrain from having an opinion on the current implementation for now.
However, the Prometheus Exposition Format is a very basic and extremely stable specification. In its simplest form, exposed metric consists of a metric name, optionally a set of labels and then a numeric value. The parts are separated eith a single space and entries are separated with a newline.
The spec for the exposition format is available here.
A basic one could look like:
pebble_status{instance="something", check="some-check"} 1
Where 1 is a boolean value
Okay, so we'd expose an HTTP endpoint, /v1/metrics
or similar, with some metrics about health checks (and potentially other Pebble things) in that format. One consideration is security: we intentionally reduced the /v1/health
endpoint down to just the "healthy: true" result to avoid exposing any additional information over this unauthenticated HTTP endpoint. Specifically, we wanted to avoid exposing the details of all the health checks. Any thoughts about whether that's a real issue?
Any thoughts about whether that's a real issue?
Yeah, this would open up for potential fingerprinting of the services. This could be solved by adding basic auth to the metrics endpoint and allowing the charm to access these credentials somehow. We currently lack the support for scrape job authentication in the Prometheus charm (or at least it hasn't been thoroughly tested), but that would be fairly trivial to support.
One considering is security: we intentionally reduced the /v1/health endpoint down to just the "healthy: true"
I wouldn't alter it. Adding /v1/metrics
sounds like a better game plan in that case as it closer matches the data it would expose.
guys have you got any updates on this issue? any help that can be given on this?
our use case is that we wanna monitor the status of our app(s) across restarts
Happy to lend a hand on it
I think given that Pebble is used across a bunch of projects now, we'd need a spec for this (and auth sounds like it might take a bit of thinking about). I like the idea, but don't have bandwidth right this cycle to spec and implement this. @shipperizer Would you be willing to create a spec for it? Proposal for /v1/metrics
HTTP endpoint, what's it returns, discussion of auth, and so on.
@benhoyt can give it a crack, hopefully
a couple of considerations are:
if those points above are good enough I will cut some time to work on it next pulse
If basic auth is what Prometheus supports, then yeah, we'll need to use that (the tricky design thing will be how to configure/get credentials).
I'd be thinking something more along the lines of pebble_service{name="srv1"} 1
and pebble_check{name="chk1"} 0
would be more like it. But I'm not very familiar with how these types of things are done in Prometheus metrics endpoints normally. @simskij will have ideas (he's posted one above too).
Given the design considerations above, I still think we need a formal spec and spec review meeting with a few invested players (Simon, Gustavo, me at least). So probably don't start on implementation until you have a spec semi-approved.
@benhoyt agreed, will try to get the specs in the next pulse
specs shared internally
@shipperizer I forget -- where did we get to with this?
@shipperizer anywhere unfortunately, i lost track of it, spec is somewhere in my drive, can dig it out and we can have another chat about it
As documented in https://juju.is/docs/sdk/pebble#heading--check-health-endpoint-and-probes:
""" As of Juju version 2.9.26, Pebble includes an HTTP
/v1/health
endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string?level=<level>
. This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise. """It would be great if there was an option to expose this in a format that prometheus could ingest. If so, we could easily integrate charms with COS by configuring them to hit this pebble endpoint and ingest the data exposed there.