beam-telemetry / telemetry_metrics_prometheus_core

Core Prometheus Telemetry.Metrics Reporter package for telemetry_metrics_prometheus
Apache License 2.0
35 stars 30 forks source link

Tag filtering optimises for unhappy path instead of happy path #66

Open hauleth opened 4 months ago

hauleth commented 4 months ago

I am working on Supavisor performance and our benchmarking shows that TelemetryMetricsPrometheusCore is main culprit of slowness. It is called in very tight loop (via :telemetry obviously) and it shows that tags filtering is substantial part of handling event.

image

I have run simple Benchee script to check out whether it would be possible to speed it up:

defmodule Filtering do
  def old_filter(tags, tag_values) do
    case Enum.reject(tags, &match?(%{^&1 => _}, tag_values)) do
      [] -> :ok
      missing_tags -> {:tags_missing, missing_tags}
    end
  end

  def old(tags, tag_values) do
    with :ok <- old_filter(tags, tag_values) do
      Map.take(tag_values, tags)
    end
  end

  def new(tags, tag_values) do
    case Map.take(tag_values, tags) do
      vals when map_size(vals) == length(tags) -> vals
      vals -> {:tags_missing, tags -- Map.keys(vals)}
    end
  end

  def new_2(tags, tag_values) do
    Enum.reduce(tags, {[], []}, fn key, {f, n} ->
      case tag_values do
        %{^key => v} -> {[{key, v} | f], n}
        _ -> {f, [key | n]}
      end
    end)
    |> case do
      {vals, []} -> vals
      {_, missing} -> {:tags_missing, missing}
    end
  end
end

tags = Map.new(1..100, &{&1, &1})
keys = Enum.take_random(1..100, 20)

Benchee.run(%{
  "matching old" => fn ->
    Filtering.old(keys, tags)
  end,
  "matching new" => fn ->
    Filtering.new(keys, tags)
  end,
  "matching new_2" => fn ->
    Filtering.new_2(keys, tags)
  end
})

Benchee.run(%{
  "missing old" => fn ->
    Filtering.old([-1 | keys], tags)
  end,
  "missing new" => fn ->
    Filtering.new([-1 | keys], tags)
  end,
  "missing new_2" => fn ->
    Filtering.new([-1 | keys], tags)
  end
})

Results show, that Enum.reject/2 approach disfavours happy path:

Operating System: macOS
CPU Information: Apple M3 Pro
Number of Available Cores: 12
Available memory: 36 GB
Elixir 1.17.1
Erlang 26.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking matching new ...
Benchmarking matching new_2 ...
Benchmarking matching old ...

Name                     ips        average  deviation         median         99th %
matching new_2        4.57 M      218.79 ns  ±9601.91%         166 ns         833 ns
matching new          3.53 M      283.58 ns  ±6234.59%         250 ns         375 ns
matching old          2.41 M      414.93 ns  ±4529.68%         375 ns         500 ns

Comparison: 
matching new_2        4.57 M
matching new          3.53 M - 1.30x slower +64.78 ns
matching old          2.41 M - 1.90x slower +196.14 ns
Operating System: macOS
CPU Information: Apple M3 Pro
Number of Available Cores: 12
Available memory: 36 GB
Elixir 1.17.1
Erlang 26.2.5

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking missing new ...
Benchmarking missing new_2 ...
Benchmarking missing old ...

Name                    ips        average  deviation         median         99th %
missing old          5.74 M      174.27 ns  ±9073.23%         166 ns         291 ns
missing new_2        1.31 M      760.70 ns  ±2264.45%         709 ns         917 ns
missing new          1.29 M      772.91 ns  ±2281.40%         709 ns         958 ns

Comparison: 
missing old          5.74 M
missing new_2        1.31 M - 4.36x slower +586.43 ns
missing new          1.29 M - 4.44x slower +598.64 ns

This is rather odd, as metrics library should rather optimise for happy path as we can assume that most of the time (ideally - always) we will see all metadata available (also, there may be some optimisations. For example length(tags) can be cached to not recompute it each time we compute metrics. Maybe there are some other optimisations that I haven't thought about, but as shown above, we can do quite better on happy path.