Closed PragTob closed 6 years ago
@PragTob I thought I'd tackle this one, since it's pretty much just a couple of other percentile numbers. How do you envision this working? Do we want to always calculate those quantiles, or only when asked to? Do you want to add a full config option for statistics, like:
%Config{
parallel: 1,
statistics: %{
mode: true,
median: true,
percentiles: [25, 50, 75, 99],
# ...
},
# ...
}
Where the defaults are set appropriately? Or maybe just add statistics.percentiles
for now, and add the other options in later? Maybe this is another issue: "Add config option for statistics"
:wave:
As statistics for drawing boxplot diagram is quite the edge case (tm) I think I'd rather just implement a function on Statistics
or Statistics.BoxPlot
and then the HTML formatter can call that. I don't see a slew of other formatters/whatever needing the data so I don't think it should be calculated every time and I think we should burden us with even another config option :)
ok. I was thinking we would save a sort of the data by running all the percentiles at once. Turns out, even though we save a sort, and the calculations are almost twice as fast, that still only comes out to ~30ms on 100k samples. So I agree!
# samples/benchee_percentile.exs
list_10k = 1..10_000 |> Enum.to_list |> Enum.shuffle
list_100k = 1..100_000 |> Enum.to_list |> Enum.shuffle
Benchee.run(%{
"single sort" => fn list ->
Benchee.Statistics.Percentile.percentiles(list, [25, 50, 75, 99])
end,
"double sort" => fn list ->
standard = Benchee.Statistics.Percentile.percentiles(list, [50, 99])
extended = Benchee.Statistics.Percentile.percentiles(list, [25, 75])
[standard, extended]
|> Enum.reduce(%{}, fn acc, percentiles -> Enum.into(percentiles, acc) end)
end
}, inputs: %{
"10k" => list_10k,
"100k" => list_100k,
})
$ mix run samples/benchee_percentile.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.5.2
Erlang 20.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
parallel: 1
inputs: 100k, 10k
Estimated total run time: 56 s
Benchmarking double sort with input 100k...
Benchmarking double sort with input 10k...
Benchmarking single sort with input 100k...
Benchmarking single sort with input 10k...
##### With input 100k #####
Name ips average deviation median 99th %
single sort 27.37 36.54 ms ±12.76% 34.66 ms 57.80 ms
double sort 15.70 63.70 ms ±6.66% 62.20 ms 81.50 ms
Comparison:
single sort 27.37
double sort 15.70 - 1.74x slower
##### With input 10k #####
Name ips average deviation median 99th %
single sort 397.94 2.51 ms ±15.14% 2.43 ms 3.80 ms
double sort 223.18 4.48 ms ±10.77% 4.35 ms 6.33 ms
Comparison:
single sort 397.94
double sort 223.18 - 1.78x slower
Added in #157
I think we still need inter quantile range, don't we? @wasnotrice ah but maybe at first benchee_html computes that... or we need another abstracter method. Not sure
@PragTob not sure—I was going by the instructions you linked to in the original....according to this comment, it seems like we only need [min, q1, median, q3, max]
(all of which we now have) but I haven't looked into it more than that :)
I think the next step would be to try generating box plots using the pre-computed values in benchee_html, and open an issue here if we are still missing values
Sounds great, thanks - sorry thought inter quantile range was part of that but sure they can compute it when they have q3 and q1 :)
drawing boxplot diagrams eats a lot of resources on the browser side in github.com/PragTob/benchee_html
So it'd be nice to provide statistics to draw them right away (like this) - median we already have, the quantille1/quantille3, Inter quantille range and others :)