discourse / prometheus_exporter

A framework for collecting and aggregating prometheus metrics
MIT License
525 stars 153 forks source link

Validate naming #310

Open PericlesTheo opened 3 months ago

PericlesTheo commented 3 months ago

👋 everyone,

Related to https://github.com/discourse/prometheus_exporter/issues/292 I've opened an MR that validates the naming expectations of Prometheus before sending the event.

I'm returning false for now but I am open to ideas as to the best way to handle this.

SamSaffron commented 2 months ago

this is a massive behavior change that is breaking, maybe we make this optional default off?

PericlesTheo commented 1 month ago

Hey @SamSaffron, how/where do you suggest we do this?

Would an optional argument in lib/prometheus_exporter/client.rb work?

module PrometheusExporter
  class Client
    def initialize(..., validate_metric_naming: false)
    end
  end
end

If we go with an approach like that, should we raise an error when the naming is invalid instead of returning false as the PR currently suggests since it's now an explicit decision from the developer?

An alternative, that might be an okay middle-ground, is to have a custom metric such as "invalid_naming" or something similar, with the label value of the failed naming. This can remove the decision making from the gem on how to proceed in the cases of errors. Developers can then have an alert on the "invalid_naming" metric and act upon.

Personally, it also satisfies the two concerns I raised here about raising an error (also returning false has the inverse concerns)

About the error, I'm not sure 🤔 My initial thoughts were:

  • Let's say you are using something like Sentry. You can flood your account with error events quite easily.
  • Is it better to have an app crashing in the middle because of metrics?

The first one is obvious as no errors will be raised. As for the second, I doubt the cardinality of the label value (invalid name) is going to be an issue.

PericlesTheo commented 1 month ago

Hey @SamSaffron any chance you had the time to look into my last message?

SamSaffron commented 1 month ago

I worry about breaking changes, I guess validate_metric_naming: true could be documented and use "metric_name_invalid" or something like that on invalid metrics.