cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.93k stars 3.78k forks source link

ui: prefix metrics to identify them as cockroachdb #44136

Open awoods187 opened 4 years ago

awoods187 commented 4 years ago

A slack user requested:

Prometheus metrics really should be prefixed with something to mark them as Cockroach-specific, like node exporter stats are all prefixed with node_ - this will keep them all in a group and also make it clear that are relevant.

Jira issue: CRDB-5250

ajwerner commented 4 years ago

Doesn’t Prometheus already add a label to indicate the service from which a metric was scraped?

I’d love to better understand the surrounding context. I don’t know if such prefixes are actually common or standard. It also seems like a non-starter to change our metric names at this point. Perhaps we could offer a flag to optionally prefix the metric names. Unfortunately it would break all of our published dashboards.

akshayjshah commented 4 years ago

In my limited experience, these prefixes are near-universal - many client libraries call them "namespaces" and require them. See https://prometheus.io/docs/practices/naming/.

Doesn’t Prometheus already add a label to indicate the service from which a metric was scraped?

I haven't run a Prometheus server of my own in a while, but my recollection is that job labels are part of the scraping configuration - it's the operator's responsibility to assign meaningful job names to pools of scraped instances. As more cloud-hosted, Prometheus-compatible systems emerge, it might be safer not to rely exclusively on careful operator configuration.

It also seems like a non-starter to change our metric names at this point. Perhaps we could offer a flag to optionally prefix the metric names. Unfortunately it would break all of our published dashboards.

Changing metric names will break backward-compatibility, but it's certainly possible for CockroachDB to add a service: cockroach tag (or something similar) to all exported metrics.

SpokeyWheeler commented 4 years ago

Screenshot 2020-02-08 at 15 43 25 Screenshot 2020-02-08 at 15 43 39 Screenshot 2020-02-08 at 15 32 52 Screenshot 2020-02-08 at 15 32 16 Screenshot 2020-02-08 at 15 42 16 That's all fine (original requester here!) but in the Prometheus "UI", your metrics are all over the place and very hard to identify. Adding a tag isn't going to fix that, because you'd have to inspect every metric to see if it has the tag.

You can see from the attached screenshots how Prometheus groups its own metrics (prometheus and node) whereas Cockroach metrics are first, last and everywhere in-between.

In fact, it occurs to me that some of the metrics in the node and prometheus groups could actually be Cockroach metrics and I'd never know!

SpokeyWheeler commented 4 years ago

I’d love to better understand the surrounding context. I don’t know if such prefixes are actually common or standard. It also seems like a non-starter to change our metric names at this point. Perhaps we could offer a flag to optionally prefix the metric names. Unfortunately it would break all of our published dashboards.

Grafana Dashboards or what? That's just JSON, isn't it? Quick bit of sed and Bob's your aunty's live-in lover.

As someone who was part of an (un)civil war about naming conventions, I can assure you that they are standard and, it turns out, quite useful.

ajwerner commented 4 years ago

Grafana Dashboards or what? That's just JSON, isn't it? Quick bit of sed and Bob's your aunty's live-in lover.

I’m not concerned about the dashboards for new deployments, I’m concerned about breaking dashboards and alerting for pre-existing clusters when they upgrade. We’ll need to be careful about the migration.

My proposal is that we add a flag and environment variable which will allow users to add a prefix to all metrics. That way we won’t break existing deployments but we’ll have a viable workaround for new deployments. Thoughts?

SpokeyWheeler commented 4 years ago

Sounds good to me.

github-actions[bot] commented 3 years ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

ajwerner commented 3 years ago

This remains a thing.