Closed RoadRunnr closed 3 years ago
Hi, thank you for the PRs. Do you have benchmark code at hand? It's ok if these are the follow ups for real world observations, no problem. I want to have something scraping-related in the benchmarks folder for some time already.
I also wonder if it makes sense to do just file:write instead of constructing binary and then doing write. OTOH escaped labels/help strings could be just cached on first encounter.
Hi, thank you for the PRs. Do you have benchmark code at hand? It's ok if these are the follow ups for real world observations, no problem. I want to have something scraping-related in the benchmarks folder for some time already.
I don't have a real benchmark, just a few lines that create a configurable amount of metrics and then I try to read them. For this PR I read them through a cowboy handler, for #120 is use timer:tc
, e.g.:
Callback = fun (Registry, Collector) -> {Time, _} = timer:tc(fun() -> prometheus_collector:collect_mf(Registry, Collector, fun({T, _}) -> io:format("T: ~p~n", [T]), ok; (_) -> ok end) end), io:format("~p, ~p, ~p~n", [Time, Registry, Collector]) end.
timer:tc(fun() -> prometheus_registry:collect(default, Callback) end).
Hey, maybe you can share those lines too? it's totally possible to timer:tc the format call.
I have pushed the bits and pieces that I used into RoadRunnr/prom
thanks, I will play a bit and this will be merged soon.
I have two further speed ups in this branch https://github.com/RoadRunnr/prometheus.erl/commits/experimental/speedup.
For 200k histograms with 10 data points each, I get a ~60% speedup in the text formatting on top of the other improvements (from ~13 sec down to ~5 sec).
Shall I open a new PR with the combined changes or wait till you have merge this PR and #120 ?
I think having "replace io_lib:format with faster helpers" commit in this MR makes sense.
Running regex (especially three times in a row) is much more expensive that a binary comprehension. This makes little difference for small data sets, for anything beyond a few hundred items it becomes measurable. Data sets beyond a few thousand items are completely unusable without this change.