beam-telemetry / telemetry_metrics

Collect and aggregate Telemetry events over time
https://hexdocs.pm/telemetry_metrics
Apache License 2.0
199 stars 32 forks source link

Confusion around Counter #110

Open yordis opened 2 weeks ago

yordis commented 2 weeks ago

Hey there, currently I have the following module,

defmodule MyAppWeb.SafeResolverMiddleware do
  # ...
  @impl Absinthe.Middleware
  def call(resolution, resolver) do
    op = Enum.find(resolution.path, &current_operation?/1)

    span_nometa(
      [:safe_resolver_middleware],
      %{operation_name: operation_name(op), operation_type: operation_type(op)},
      fn -> Resolution.call(resolution, resolver) end
    )
  rescue
    exception ->
      # ....
  end

  # ... in another module
  def span_nometa(event, start_and_stop_metadata, fun) do
    # more and less what we have today
    :telemetry.span([:myapp] ++ event, start_and_stop_metadata, fn ->
      {fun.(), start_and_stop_metadata}
    end)
  end
end

Which I would like to have a counter around the exception,

counter(
  "myapp.safe_resolver_middleware.exception.duration",
  tags: [:operation_name, :operation_type]
)

Readind the following, https://github.com/beam-telemetry/telemetry_metrics/blob/ab8616480cc78bd2377c390f07261da6f37a7401/lib/telemetry_metrics.ex#L15-L18

You could define a counter metric, which counts how many HTTP requests were completed:

Telemetry.Metrics.counter("http.request.stop.duration")

The metrics mention that it is counting how many, but it is using duration measurement, which is confusing to me. Counting "duration" sounds odd.

I tried to change the measurement to .count, but that doesn't exist in the measurement of the metrics so that it would fail.

measurement = %{monotonic_time: -576460706732624847, duration: 4314}
metadata = %{reason: %RuntimeError{message: "hello, world"}, stacktrace: [], kind: :error, operation_name: "listUsers", operation_type: :query, telemetry_span_context: #Reference<0.3696372234.2707161091.20455>}

I understand that I can avoid using the convention from https://github.com/beam-telemetry/telemetry_metrics/blob/ab8616480cc78bd2377c390f07261da6f37a7401/lib/telemetry_metrics.ex#L47-L48 and I could have more control over the metrics.

Also, I could pass :count as a measurement by using :telemetry.execute/3 but :telemetry.span/3 does not allow me to pass the measurement, and it is probably more common to use :telemetry.span/3 than :telemetry.execute/3 I found.

So, I am confused about how to focus on just counting +1.

josevalim commented 2 weeks ago

Because measurements inside a metric may be missing, you always count a measurement. So the docs are correct, you are counting how many metrics are emitted with the duration key.

yordis commented 2 weeks ago

you are counting how many metrics are emitted with the duration key.

Ooooh, I get it. That would imply that my intent is counter("myapp.safe_resolver_middleware.exception") then?

Except that, I felt discouraged by the following line https://github.com/beam-telemetry/telemetry_metrics/blob/ab8616480cc78bd2377c390f07261da6f37a7401/lib/telemetry_metrics.ex#L556

Since it will require to pass event_name, which is not a big deal, still.

My biggest concern is that if I just do that.

counter(
  "myapp.safe_resolver_middleware.exception",
  event_name: "myapp.safe_resolver_middleware.exception"
)

Then it will skip the metric, as follows:

[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: myapp.safe_resolver_middleware.exception
All measurements: %{monotonic_time: -576460704105113741, duration: 5968}
All metadata: %{reason: %RuntimeError{message: "..."}, stacktrace: [], kind: :error, operation_name: "getAuctions", operation_type: :query, telemetry_span_context: #Reference<>}

Metric measurement: :exception (counter)
Measurement value missing (metric skipped)

Indeed, there is no exception measurement; I am not trying to count the measurement inside the metric. I am just counting the metrics.

And I kept trying to fix it until I decided I may be doing something wrong.