beam-telemetry / telemetry_metrics_prometheus_core

Core Prometheus Telemetry.Metrics Reporter package for telemetry_metrics_prometheus
Apache License 2.0
35 stars 30 forks source link

Prometheus metric name via reporter_options ? #31

Closed chulkilee closed 3 years ago

chulkilee commented 3 years ago

Instead of using custom metric name and overriding event_name and measurement.. what about accepting name in reporter_options?

From:

Metrics.distribution("prometheus_metrics.scrape.duration.milliseconds",
  event_name: [:prometheus_metrics, :plug, :stop],
  measurement: :duration,
  unit: {:native, :second},
  reporter_options: [
    buckets: [0.05, 0.1, 0.2, 0.5, 1],
  ]
)

To:

Metrics.distribution("prometheus_metrics.plug.stop",
  unit: {:native, :second},
  reporter_options: [
    name: "prometheus_metrics.scrape.duration.milliseconds"
    buckets: [0.05, 0.1, 0.2, 0.5, 1],
  ]
)
bryannaegele commented 3 years ago

Hey @chulkilee. If I'm reading this correctly, the example you proposed is actually a change to how Telemetry.Metrics works. What you have declared as :name would be trying to override the name that is in the metric definition.

You can check out how what the resulting struct looks like if you try that in an iex session. What you'd end up with is a definition that is listening for a [:prometheus_metrics, :plug] event with the measurement of :stop (it's hidden in that function, but trust me it's there 😉 ). The reporters just make use of what the Metrics definition struct is composed of and must adhere to what it defines.

%Telemetry.Metrics.Distribution{
  description: nil,
  event_name: [:prometheus_metrics, :plug],
  keep: nil,
  measurement: #Function<3.70924598/1 in Telemetry.Metrics.maybe_convert_measurement/2>,
  name: [:prometheus_metrics, :plug, :stop],
  reporter_options: [
    name: "prometheus_metrics.scrape.duration.milliseconds",
    buckets: [0.05, 0.1, 0.2, 0.5, 1]
  ],
  tag_values: #Function<0.70924598/1 in Telemetry.Metrics.default_metric_options/0>,
  tags: [],
  unit: :second
}

That's in contrast to the first example.

%Telemetry.Metrics.Distribution{
  description: nil,
  event_name: [:prometheus_metrics, :plug, :stop],
  keep: nil,
  measurement: #Function<3.70924598/1 in Telemetry.Metrics.maybe_convert_measurement/2>,
  name: [:prometheus_metrics, :scrape, :duration, :milliseconds],
  reporter_options: [buckets: [0.05, 0.1, 0.2, 0.5, 1]],
  tag_values: #Function<0.70924598/1 in Telemetry.Metrics.default_metric_options/0>,
  tags: [],
  unit: :second
}
chulkilee commented 3 years ago

If I'm reading this correctly, the example you proposed is actually a change to how Telemetry.Metrics works.

No, Telemetry.Metrics work as-is, but I'm wondering if TelemetryMetricsPrometheus reporter can override the metric name.

Currently

As a result, if I want to override the Prometheus metric name.. I have to put the prometheus metric name in Telemetry metric name, which loses "auto-detection" - so I have to specify them manually. This is what I want to avoid.

Right now, I'm writing my own wrapper like this:

defp gauge(prometheus_name, event_name, measurement, opts \\ []) do
  last_value(prometheus_name, [event_name: event_name, measurement: measurement] ++ opts)
end

# then use it like this
gauge("erlang_vm_atom_bytes", [:vm, :memory], :atom, unit: :byte)

Instead of adding (prometheus) metric name in reporter_options - we may add such "helper" method - I think this is actually clearer and more explicit (e.g. user can search prometheus metric name from code, not knowing about the name conversion inside TelemetryMetricsPrometheus).

I don't know it's common to share a list of metrics between different reporters - if that should be supported, then it should be in reporter_options.

bryannaegele commented 3 years ago

Currently

* Telemetry.Metrics detects event_name and measurement automatically from metric name

* TelemetryMetricsPrometheus uses the metric name and converts it for prometheus metric name

This isn't entirely accurate. Telemetry Metrics can auto-magically derive the event name and measurement if the event name and measurement are omitted. This library is no different and it doesn't convert anything about the name, it's just using what is provided in the definition.

If you want to be explicit about the name you want to report the metric as, then you have to supply those arguments in the create function of the the metric. You aren't required to conform to Prometheus' conventions.

Are you using multiple reporters with the same definition? Can you explain your use case? I haven't heard of it in practice, which is why I ask. I'm curious how adding an override for the name in reporter options for one reporter would solve the issue you raise when using the same definition to multiple reporters. Hypothetically, if this option were implemented here and in the statsd reporter, you'd still end up in the same predicament, correct? How would we handle this conflict across reporters?

Reporter options were originally added simply because statsd doesn't make use of buckets and they were a required argument to distribution at the time. The decision was then made that if there were things that were reporter-specific then they could be passed this way. I don't think we ever considered how these options would be distinguished across multiple reporters, however. Should something like this be namespaced, like :prometheus_name?

chulkilee commented 3 years ago

This library is no different and it doesn't convert anything about the name, it's just using what is provided in the definition.

No, per doc

The exporter sanitizes names to Prometheus' requirements (Metric Naming) and joins the event name parts with an underscore.

And see https://github.com/beam-telemetry/telemetry_metrics_prometheus_core/blob/v0.4.1/lib/core/exporter.ex#L95


My use case - I'd like to expose all metrics in prometheus recommended names - which requires passing prometheus metric name in Telemetric metrics, and pass event_name and measurement for ALL metrics!

Example (with only 3)

      # only 3
      last_value("erlang_vm_atom_bytes",
        event_name: [:vm, :memory],
        measurement: :atom,
        unit: :byte
      ),
      last_value("erlang_vm_binary_bytes",
        event_name: [:vm, :memory],
        measurement: :binary,
        unit: :byte
      ),
      last_value("erlang_vm_code_bytes",
        event_name: [:vm, :memory],
        measurement: :code,
        unit: :byte
      ),

With reporter_option: a little bit better

      # only 3
      last_value("vm.memory.atom", unit: :byte, reporter_options: [name: "erlang_vm_atom_bytes"]),
      last_value("vm.memory.binary",
        unit: :byte,
        reporter_options: [name: "erlang_vm_binary_bytes"]
      ),
      last_value("vm.memory.code", unit: :byte, reporter_options: [name: "erlang_vm_code_bytes"]),

Or shortter with the above helper func.

      gauge("erlang_vm_atom_bytes", [:vm, :memory], :atom, unit: :byte),
      gauge("erlang_vm_binary_bytes", [:vm, :memory], :binary, unit: :byte),
      gauge("erlang_vm_code_bytes", [:vm, :memory], :code, unit: :byte),
      gauge("erlang_vm_ets_bytes", [:vm, :memory], :ets, unit: :byte),
      gauge("erlang_vm_processes_bytes", [:vm, :memory], :processes, unit: :byte),
      gauge("erlang_vm_system_bytes", [:vm, :memory], :system, unit: :byte),
      gauge("erlang_vm_total_bytes", [:vm, :memory], :total, unit: :byte),
      gauge("erlang_vm_atom_total", [:vm, :system_counts], :atom_count),
      gauge("erlang_vm_port_total", [:vm, :system_counts], :port_count),
      gauge("erlang_vm_process_total", [:vm, :system_counts], :process_count),
      gauge("erlang_vm_run_queue_length.cpu", [:vm, :total_run_queue_lengths], :cpu),
      gauge("erlang_vm_run_queue_length.io", [:vm, :total_run_queue_lengths], :io),
      gauge("erlang_vm_run_queue_length.total", [:vm, :total_run_queue_lengths], :total),

Or.. probably it's better to have a list of map and then transform and call function based on it.

[
  {:gauge, "erlang_vm_atom_bytes", "vm.memory.atom", unit: :byte},
  ...
]
|> to_prometheus_metrics()

Unfortunately converting name to event name and measurement is private func - https://github.com/beam-telemetry/telemetry_metrics/blob/v0.5.0/lib/telemetry_metrics.ex#L548

Probably better to have (optional) helper for prometheus?

bryannaegele commented 3 years ago

If you want to open a PR with putting an override name in reporter options under :exported_name then I think that would meet your needs? For internals, I think you can just override the stored name.

chulkilee commented 3 years ago

I just found https://github.com/beam-telemetry/telemetry_metrics/issues/78 - which is kind of related to this. I think this should be addressed in Telemetry.Metrics so I left a comment there.

Closing this for now. I'll keep using my own thin wrapper for now..

@bryannaegele Thanks a lot for the follow up! ❤️