bencheeorg / benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!
MIT License
1.42k stars 66 forks source link

Property based benchmarking #249

Open PragTob opened 6 years ago

PragTob commented 6 years ago

This had been in my mind for a long time but I guess it's worth getting out there:

We already have inputs. Property based testing is great and support in the elixir system is growing a lot. Why not just put in a generator as an input and let it generate inputs? Statistics will become more interesting!

As generators usually generate more complex data as it goes on it might be a bit weird as the first recorded run times will be fast and the last recorded run times (or memory measurements) will be higher. We might even need a different type of generator - only time will tell.

We need to save and show the seed to ensure reproducibility.

Once we have that there are a lot of optional things we could do:

This is definitely a >= 1.x or 2.x feature even ;)

devonestes commented 6 years ago

I’m curious - what do you think this would tell us about our code? Will we be able to clearly see from the results if our function is O(n) vs O(n^2) or something like that?

The only thing I could see this for is asserting that a function is O(1) time or O(1) memory, since that’s the only property that should be measurable for any random input of any size. But you could also assert that with two inputs that differ by an order of magnitude, without the need for stream_data.

PragTob commented 6 years ago

I haven't looked at it from the perspective of O-notations. That might be doable though.

I've looked at it more from the perspective of:

PragTob commented 6 years ago

also in case anything hits a cache on any layer (even cpu lvl...) then more varied inputs assure more accurate results :)

devonestes commented 6 years ago

Ok, so now I see how this might work - or at least I have some idea of how I think this could work. I was just totally missing it at first.

We could run each randomly generated input a certain number of times (configurable by the user), and after that number of runs we'd move on to the next input. Then, after the total benchmark time is up (also configurable by the user as it is now), we'd show the n fastest and slowest inputs and the stats for those inputs, and maybe the median as well? Is that along the lines of what you're thinking?

I just totally wasn't thinking about it in those terms at first, so the whole concept escaped me! I can certainly see value in that, although it does sound like a somewhat significant amount of work.

The second thing though - getting a performance profile across a wide array of inputs - can be done now. Take the following example:

input_stream = StreamData.integer()
get_num = fn -> Stream.take(input_stream, 1) |> Enum.map(& &1) |> hd() end

Benchee.run(%{
  "some important business code" => fn -> Business.make_money(get_num()) end
}, time: 10)

One of the awesome composability benefits of having benchee's API be "just a function" :tada:

PragTob commented 6 years ago

Yeah I was thinking in both ways :) And I'm not quite sure which one I like better.

E.g. cases:

  1. run each input a certain amount of time (gives you detailed insight in behaviour of different inputs)
  2. wide array of inputs (some nice average numbers)

Your example could be improved up I think - right now get_num has an impact on the run time. We could stick it into a before_each and feed it to the function for maximum benefit. That's sort of what I imagine the implementations look like (separate step though probably)

PragTob commented 6 years ago

(note to self, property based benchmarking is probably a horrible name as we're not checking properties of the benchmarks but it's more like streamdata based benchmarking or random data based benchmarking or fuzzing benchmarking or something)