evanphx / benchmark-ips

Provides iteration per second benchmarking for Ruby
MIT License
1.72k stars 97 forks source link

Add support for hold! and independent invocations. #56

Closed chrisseaton closed 8 years ago

chrisseaton commented 8 years ago

benchmark-ips is often used to compare different ways to implement the same functionality, but running two benchmarks in the same invocation of Ruby is not going to tell you whether A or B is faster, it's going to tell you if A, or B-given-that-A-has-already-run is faster.

For the theory on why this is not ideal, see research such as Kalibera and Jones [1].

For a practical example of why this is not ideal I can illustrate with this benchmark:

require 'benchmark/ips'

class Foo
  def method_under_test
    10
  end
end

class Bar
  def method_under_test
    20
  end
end

def call_method_under_test(x)
  x.method_under_test
end

Benchmark.ips do |x|

  foo = Foo.new

  x.report("test_a") do
    call_method_under_test foo
  end

  bar = Bar.new

  x.report("test_b") do
    call_method_under_test bar
  end

  # x.hold! - I'll explain this later on 
  x.compare!

end

The problem is that in an optimising implementation of Ruby will compile the call x.method_under_test as monomorphic for the first benchmark, and bimorphic for the second. The second benchmark is disadvantaged and the results we get for the two benchmarks are dependent on their order.

I can demonstrate this with JRuby+Truffle. I run the benchmark, asking it to only compile the method call_method_under_test and turning off splitting and OSR to reduce the noise (otherwise understanding what is being compiled and why is hard).

$ JAVACMD=../../graal/GraalVM-0.9/jre/bin/javao bin/jruby -X+T -J-server -J-G:+TraceTruffleCompilation -J-G:TruffleCompileOnly=call_method_under_test -J-G:-TruffleOSR -J-G:-TruffleSplitting -I ../benchmark-ips/lib dependence.rb

We see call_method_under_test compiled twice, once for each benchmark (it has to be recompiled, as the compiler compiled it with the assumption that it would always be monomorphic baked in, but in some cases the first benchmark keeps the original monomorphic copy and only the second gets the bimorphic one). The first copy is 164 bytes of code, the second is 188. The second benchmark is larger and slower because it came second and so is polymorphic.

[truffle] opt done         call_method_under_test:dependence.rb:15 <opt>               |ASTSize      14/   18 |Time   287( 259+28  )ms |DirectCallNodes I    1/D    0 |GraalNodes    35/   28 |CodeSize          164 |Source dependence.rb:15 
[truffle] opt done         call_method_under_test:dependence.rb:15 <opt>               |ASTSize      14/   22 |Time   155( 145+10  )ms |DirectCallNodes I    2/D    0 |GraalNodes    40/   25 |CodeSize          188 |Source dependence.rb:15 

My solution to this is to run each benchmark in an independent VM. I added a method hold! (like compare!) that runs one benchmark, saves the results, and prompts you to run Ruby again for the next benchmark.

$ ... bin/jruby -X+T ...
...
[truffle] opt done         call_method_under_test:dependence.rb:15 <opt>               |ASTSize      14/   18 |Time   252( 214+39  )ms |DirectCallNodes I    1/D    0 |GraalNodes    35/   28 |CodeSize          164 |Source dependence.rb:15
...
Pausing here -- run Ruby again to measure the next benchmark...
$ ... bin/jruby -X+T ...
...
[truffle] opt done         call_method_under_test:dependence.rb:15 <opt>               |ASTSize      14/   18 |Time   252( 214+39  )ms |DirectCallNodes I    1/D    0 |GraalNodes    35/   28 |CodeSize          164 |Source dependence.rb:15 
...

Now I get two methods, both 164 bytes, and I've so made the benchmarks independent (or a bit more independent).

This is optional - you only need to use it if you want to. It saves data in a little file that is checksummed so it's invalidated if you benchmark changes.

[1] https://kar.kent.ac.uk/33611/7/paper.pdf

evanphx commented 8 years ago

I don't think Proc#source_location is available in 1.8, so this would break 1.8.

I understand the desire and I think I'd prefer instead of magically inferring a name, a name should have to passed to #hold! that is the path to store the held data in.

chrisseaton commented 8 years ago

Updated to use a file specified as hold! 'file.data'. That simplified the implementation a bit as well, actually.