I am working on Supavisor performance and our benchmarking shows that TelemetryMetricsPrometheusCore is main culprit of slowness. It is called in very tight loop (via :telemetry obviously) and it shows that tags filtering is substantial part of handling event.
I have run simple Benchee script to check out whether it would be possible to speed it up:
defmodule Filtering do
def old_filter(tags, tag_values) do
case Enum.reject(tags, &match?(%{^&1 => _}, tag_values)) do
[] -> :ok
missing_tags -> {:tags_missing, missing_tags}
end
end
def old(tags, tag_values) do
with :ok <- old_filter(tags, tag_values) do
Map.take(tag_values, tags)
end
end
def new(tags, tag_values) do
case Map.take(tag_values, tags) do
vals when map_size(vals) == length(tags) -> vals
vals -> {:tags_missing, tags -- Map.keys(vals)}
end
end
def new_2(tags, tag_values) do
Enum.reduce(tags, {[], []}, fn key, {f, n} ->
case tag_values do
%{^key => v} -> {[{key, v} | f], n}
_ -> {f, [key | n]}
end
end)
|> case do
{vals, []} -> vals
{_, missing} -> {:tags_missing, missing}
end
end
end
tags = Map.new(1..100, &{&1, &1})
keys = Enum.take_random(1..100, 20)
Benchee.run(%{
"matching old" => fn ->
Filtering.old(keys, tags)
end,
"matching new" => fn ->
Filtering.new(keys, tags)
end,
"matching new_2" => fn ->
Filtering.new_2(keys, tags)
end
})
Benchee.run(%{
"missing old" => fn ->
Filtering.old([-1 | keys], tags)
end,
"missing new" => fn ->
Filtering.new([-1 | keys], tags)
end,
"missing new_2" => fn ->
Filtering.new([-1 | keys], tags)
end
})
Results show, that Enum.reject/2 approach disfavours happy path:
Operating System: macOS
CPU Information: Apple M3 Pro
Number of Available Cores: 12
Available memory: 36 GB
Elixir 1.17.1
Erlang 26.2.5
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s
Benchmarking matching new ...
Benchmarking matching new_2 ...
Benchmarking matching old ...
Name ips average deviation median 99th %
matching new_2 4.57 M 218.79 ns ±9601.91% 166 ns 833 ns
matching new 3.53 M 283.58 ns ±6234.59% 250 ns 375 ns
matching old 2.41 M 414.93 ns ±4529.68% 375 ns 500 ns
Comparison:
matching new_2 4.57 M
matching new 3.53 M - 1.30x slower +64.78 ns
matching old 2.41 M - 1.90x slower +196.14 ns
Operating System: macOS
CPU Information: Apple M3 Pro
Number of Available Cores: 12
Available memory: 36 GB
Elixir 1.17.1
Erlang 26.2.5
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s
Benchmarking missing new ...
Benchmarking missing new_2 ...
Benchmarking missing old ...
Name ips average deviation median 99th %
missing old 5.74 M 174.27 ns ±9073.23% 166 ns 291 ns
missing new_2 1.31 M 760.70 ns ±2264.45% 709 ns 917 ns
missing new 1.29 M 772.91 ns ±2281.40% 709 ns 958 ns
Comparison:
missing old 5.74 M
missing new_2 1.31 M - 4.36x slower +586.43 ns
missing new 1.29 M - 4.44x slower +598.64 ns
This is rather odd, as metrics library should rather optimise for happy path as we can assume that most of the time (ideally - always) we will see all metadata available (also, there may be some optimisations. For example length(tags) can be cached to not recompute it each time we compute metrics. Maybe there are some other optimisations that I haven't thought about, but as shown above, we can do quite better on happy path.
I am working on Supavisor performance and our benchmarking shows that
TelemetryMetricsPrometheusCore
is main culprit of slowness. It is called in very tight loop (via:telemetry
obviously) and it shows that tags filtering is substantial part of handling event.I have run simple Benchee script to check out whether it would be possible to speed it up:
Results show, that
Enum.reject/2
approach disfavours happy path:This is rather odd, as metrics library should rather optimise for happy path as we can assume that most of the time (ideally - always) we will see all metadata available (also, there may be some optimisations. For example
length(tags)
can be cached to not recompute it each time we compute metrics. Maybe there are some other optimisations that I haven't thought about, but as shown above, we can do quite better on happy path.