When there is variable setup time

ioquatix commented 5 years ago

I'm interested in comparing two (or more) implementations of the same thing.

There is some "one off" setup cost - establishing the connection, and then the repeated times cost, i.e. what I'm interested in.

For one benchmark, the setup cost might be N and for the other, 2N. The iteration cost is largely the same. However, what ends up happening is that for the benchmark with N setup overhead, the times repeats is much larger than the benchmark with 2N overhead. This causes the effect of the setup to be even more pronounced, because benchmark-ips will set times to, say, 80, for the case of 2N setup cost, and 500 for the case of N setup cost, so you end up with:

(2N + 80) * number of repeats until 5 seconds is elapsed, vs
(N + 500) * number of repeats until 5 seconds is elapsed.

Ultimately, it makes the 2nd case look much better even though the different is mostly in the setup overhead.

Is there some way to take this bias into account? My initial thoughts were to use the upper bound for times (or perhaps the average) so that each benchmark would be largely running with the same number, proportionally, of setup overhead to number of times.

ioquatix commented 5 years ago

Another interesting idea would be to run the benchmark with times=0 to compute the overhead and subtract that from subsequent test runs.

kbrock commented 5 years ago

@ioquatix I tend to setup the connection outside my benchmarks. So I can get as little setup inside my loop.

Is it possible for you to do something like that?

ioquatix commented 5 years ago

@kbrock thanks for your reply.

This was my ultimate conclusion - trying to do setup work throws the system off too much. I think we can close this issue.

That being said, for further discussion:

Setup is still part of the overall cost.

It also makes the comparisons less isolated because setup is done outside of the benchmark. In some ways, it makes sense if you just want to compare the operational overhead, but if you want to compare the full cost of two different implementations, it's tricky.

I did try using plain benchmark with fixed repetitions, and I think maybe for my use case, that makes more sense, w.r.t. testing setup as well as overhead. But it lacks the instructions/second which is pretty useful, and the formatting is less pretty, warmup phase, etc

kbrock commented 5 years ago

@ioquatix yea. I've seen some nice benchmarks that show the before warmup numbers and the after warmup numbers. So you can get an idea if a cache is used, or connection pooling. Very nice stuff.

This does provide the warmup phase, but isn't setup to record/graph the warmup.

ioquatix commented 5 years ago

One way you could do it is inject into the benchmark some "timer" object, e.g.

x.benchmark do |repeats, measure|
    client = connect_to_server
    measure.setup! # measure the time it took to setup
    index = 0
    while index < repeats
        client.get(...)
        measure.warmup
    end
end

There are other ways to do this, maybe with a better interface. But that's roughly how it could work.

ioquatix commented 5 years ago

Another proposal/idea:

x.benchmark(timer: true) do |timer|
    client = connect_to_server

    timer.capture do |repeats|
        index = 0
        while index < repeats
            client.get(...)
        end
    end
end

You could capture multiple blocks, e.g.

x.benchmark(timer: true) do |timer|
    client = connect_to_server

    timer.capture("get") do |repeats|
        index = 0
        while index < repeats
            client.get(...)
        end
    end

    timer.capture("post") do |repeats|
        index = 0
        while index < repeats
            client.post(...)
        end
    end
end

but maybe that's getting too complicated.

kbrock commented 5 years ago

x.benchmark(timer: true) do |timer|
    client = connect_to_server

    timer.capture do |repeats|
        index = 0
        while index++ < repeats
            client.get(...)
        end
    end
end

is basically

client = connect_to_server
x.benchmark(timer: true) do |timer|
    timer.capture do |repeats|
        index = 0
        while index++ < repeats
            client.get(...)
        end
    end
end

and while that is not capturing the setup (connect_to_server) it is basically what I tend to do. And yes, it does not capture the amount of time for warmup.

But this is called "iterations per second" - so it does loose the nuance of warming up and getting up to speed.

ioquatix commented 5 years ago

I agree with you and I realise my proposals are adding more complexity without a huge gain in functionality. The main point is to keep all the benchmark code within one block. To avoid leaking state. It gets a bit ugly when you have a ton of setup code, especially if different benchmarks shares similar concepts. Because it's not clear where one benchmark ends and the next one starts. It all gets mixed up and variable names are used to untangle it at the top level scope.

kbrock commented 5 years ago

@ioquatix Maybe you misunderstood. The timer block just provides a reference to the capture method. Moving a method into that block or keeping it out is fine / does the same thing. The code in that block are there to setup the benchmark. Any code you put in there will be run before the benchmarks actually begin.

ioquatix commented 5 years ago

I get it. It’s just a shared name space for all benchmarks so if I have two very similar things but different implementations I have to call them e.g. client_implementation_x and client_implementation_y so they don’t clash and it just gets a bit messy.

ioquatix commented 4 years ago

@eregon this is why the warmup cycles needs to be more than one.

eregon commented 4 years ago

I think it's good enough to take the setup outside of the report block here. And so I'd suggest to close this issue, moving the setup outside the report is the recommended approach.

If you want to isolate local variables, you can always use an extra lambda as a scope:

Benchmark.ips do |x|
  -> {
    client = connect_to_server
    x.report "mybench" do
      client.get(...)
    end
  }.call
end

evanphx / benchmark-ips

When there is variable setup time #91