bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.42k stars 66 forks source link

Support per input times / limit sample size / support loading disjoint inputs #326

Open baseballlover723 opened 4 years ago

baseballlover723 commented 4 years ago

So I was using benchee and I came across this problem. I've got a bunch of tests with a few different inputs with greatly varying sizes and found that if I use time: 30, then I generate too many samples for my small inputs and I run into https://github.com/bencheeorg/benchee_html/issues/3 (as well as huge report files (several hundred MiB)), but then for my larger inputs, I don't get that many samples (sometimes single digits). This is rather inconvenient since it takes forever to view in the browser, and the report files are huge.

The end result I'm looking for is to be able to generate a single report with slightly different configs (adjusting the time for particular inputs in my case). I tried using save and load, since that seem to be the best way to do it, but I found that it ignores inputs that aren't in the final benchmark (see https://gist.github.com/baseballlover723/be1ce508c809548d39f6d5906321a197 for an example). Of course I could restrict my input range, but that seems like a poor workaround.

The way I see it, theres a few things that could be done that would solve these types of issues.

  1. support overriding the time as part of an input. something like inputs: {[name: "small", value: Enum.to_list(1..10), time: 2], [name: "very large", value: Enum.to_list(1..1_000_000), time: 30]}. This would allow users to specify more granular config settings for particular inputs. Other config options that might be nice to specify per input that I can think of are time, warmup, memory_time, parallel, and probably some others as well.

  2. allow a maximum sample size count. essentially a config value to limit the number of samples (doesn't have to be super accurate for my case). This way I could say specify that I don't want more then about 500_000 samples, which would give good enough performance, and sufficiently accuracy.

  3. support running with multiple configs and generate a single report at the end. This would to be solved by using save and load, but I found that those only give reports for previous tests if that input also showed up in the last test (see https://gist.github.com/baseballlover723/be1ce508c809548d39f6d5906321a197 for an example). If there was an option to load all of the previous scenarios and inputs then I could run benchee with my small input with a low time, save it and then run another benchee with a longer time and inputs and then load the previous scenarios and inputs, and append those to the report.

  4. support appending to report files. This is probably a mess and is likely better solved by 3

Anyways, I've really enjoyed using benchee and this is the only issue I had with it, thanks for making it.

PragTob commented 4 years ago

Hey there,

thanks for your feedback and issues you're raising! :green_heart:

It's eerily similar in some aspects to #325 .

I like the idea of setting a maximum_sample_size in general. It's the easiest option to avoid lots of these problems we're seeing people have that just benchmark a ton of things. And also maybe give the option not to do statistics in parallel due to well memory consumption. We're also looking at finally implementing "benchmark until confidence reached".

Per input times sound enticing but more complicated. We might add that in the future. It has the inherent problem that whatever structure you use to represent more values can't be used as a "normal" input value anymore. But I guess if we created a special benchee struct for it, it'd be alright and then the struct could contain itself if you wanted to do benchmarking with it.

  1. kinda surprised me to be honest. I might double check and open a separate bug ticket for it. That should work from what I think it should work like.

Cheers