Bodigrim / tasty-bench

Featherlight benchmark framework, drop-in replacement for criterion and gauge.
https://hackage.haskell.org/package/tasty-bench
MIT License
80 stars 11 forks source link

averaging over several random inputs #61

Open MangoIV opened 10 hours ago

MangoIV commented 10 hours ago

Hi! Say I have an algorithm that I expect to perform differently on a bunch of different inputs and say I have a sufficiently good Gen instance ~ why would it be a bad idea to generate a bunch of inputs with Gen and then average the benchmarks over these to get a good picture on how the algorithm behaves on different inputs?

I feel like this would be kind of an obvious thing to do, not necessarily via Gen but maybe with an interface like this:

benchRandom 
  :: (NFData a, NFData b)
  => IO a 
  -- ^ obtain an input randomly and evaluate it to normal form
  -> (a -> b) 
  -- ^ a function to benchmark
  -> Benchmarkable
benchRandom = todo 

and then maybe

benchArbitrary 
  :: (NFData a, Arbitrary a, NFData b) 
  => (a -> b) 
  -- ^ a function go generate inputs for 
  -> Benchmarkable 
benchArbitrary = benchRandom (generate arbitrary)

What do you think?

MangoIV commented 9 hours ago

Benchmarkable doesn't really allow for this though atm.

Bodigrim commented 1 hour ago
funcToRandomBench
  :: forall a b c.
     (b -> c) -- ^ How to evaluate results, usually 'id' or 'rnf'.
  -> (a -> b) -- ^ What to benchmark
  -> IO a     -- ^ Generator of inputs
  -> Benchmarkable
funcToRandomBench frc = (Benchmarkable .) . funcToRandomBenchLoop SPEC
  where
    funcToRandomBenchLoop :: SPEC -> (a -> b) -> IO a -> Word64 -> IO ()
    funcToRandomBenchLoop !_ f mx n
      | n == 0    = pure ()
      | otherwise = do
        x <- mx
        _ <- evaluate (frc (f x))
        funcToRandomBenchLoop SPEC f mx (n - 1)
{-# INLINE funcToRandomBench #-}

Would this be enough? It's up to supplied IO a whether to generate random inputs each time or pregenerate a list of them and iterate over it repeatedly.

MangoIV commented 1 hour ago

You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs. I tried to use perRunEnv with criterion with this random function and it gave me wild results. It might be similar here (though it might also be just fine?) I think the function you propose is pretty good for random inputs that are expected to have very similar runtimes.

Bodigrim commented 1 hour ago

You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs.

If you limit IO a to iterate over a finite list of random inputs, this would happen automatically in funcToRandomBench, right?

MangoIV commented 58 minutes ago

I mean sure but I feel like this should ultimately be part of the interface, no?

Btw ~ really cool library, currently only using it to output stats in form of a csv but that’s much easier and quicker than with criterion ❤️