Bodigrim / tasty-bench

Featherlight benchmark framework, drop-in replacement for criterion and gauge.
https://hackage.haskell.org/package/tasty-bench
MIT License
80 stars 11 forks source link

Run an IO action in the bench without timing it #12

Closed kozross closed 3 years ago

kozross commented 3 years ago

I want to benchmark a search algorithm implementation. As a result, I'd like to generate random needles for each iteration of the bench. However, I don't want to also measure the PRNG. Is there a way to do this currently?

Bodigrim commented 3 years ago

It's a bit unclear to me, what design you have in mind. How would your benchmark look, if you agree to measure the PRNG overhead?

kozross commented 3 years ago

It would be something like:

bench "Foo" . nfAppIO (\h -> doTheSearch <$> genRandomNeedle <*> pure h) $ haystack

However, this will end up benchmarking both genRandomNeedle and the search for it, whereas I only want to benchmark the search. Furthermore, I only need IO for the random generation, not the search. The idea is that as the search is run many times during the benchmarking process, by randomizing the needle I get a good idea of 'average' performance.

Bodigrim commented 3 years ago

Unfortunately, I don't think there is a reasonable workaround for this particular scenario. Both individual generation of a needle and the search itself are well beyond the granulatiry of a system timer, so we cannot start-stop it.

FWIW I think that genRandomNeedle implemented via http://hackage.haskell.org/package/random-1.2.0/docs/System-Random-Stateful.html#g:11 should be blazingly fast and thus would not affect overall measures in a significant way.

In general I'm inclined to advise against such design: to achieve reliable results with a predictable deviation, it's crucial to benchmark exactly the same data over and over again. Maybe generate several thousands of needles at top level and iterate over them all in a single benchmark?..

kozross commented 3 years ago

Yeah, I think you're right on all counts. However, if I generate all the needles at the top level, then do them all in one bench, I'll get a result for a thousand needles, not the average for one needle. Unless I'm missing something?

Bodigrim commented 3 years ago

Well, you can divide by thousand to approximate the average. I agree that it's not that convenient, but absolute numbers are of limited value anyways.

kozross commented 3 years ago

Fair enough. Thanks for the advice!

soupi commented 3 years ago

A different data point - what if I want to test an external system that needs restarts, preparation or warmup/cache clearing before benchmarking? This is not something one can do outside of the testing loop. This is also something that for example hyperfine does provide.

Bodigrim commented 3 years ago

Unless I'm missing something, this is what env is for.

soupi commented 3 years ago

Does env run for each benchmark or for each timing run?

Bodigrim commented 3 years ago

env runs once, before measurements of a particular routine take place.