Problem with getting reliable results

I'm sorry for being so noisy on this issue tracker. But I'm getting results I don't quite understand.

I want to compare the performance of two versions of my library. For that purpose I want to overwrite the import of my library for different benchmarks. Something like this var mylib = global.mylibNewVersion. However, as soon as I do that I start seeing inconsistent results.

In the first suite below I do a dummy overwrite of the functions my library exposes and in the second I don't. The actual benchmarks in each suite are identical so the suites should report no significant difference between their benchmarks.

suites.push(Benchmark.Suite('Benchmark').add('First', {
  setup: function() {
    var flyd = global.flyd; // Shaddow/override globals
    var stream = global.flyd.stream;
    // My init code
  },
  fn: function() {
    // Code to benchmark
  },
}).add('Second', {
  setup: function() {
    var flyd = global.flyd; // Shaddow/override globals
    var stream = global.flyd.stream;
    function f(x) { return x; }
    // My init code
  },
  fn: function() {
    // Code to benchmark
  },
}));

suites.push(Benchmark.Suite('Other benchmark').add('First', {
  setup: function() {
    // My init code
  },
  fn: function() {
    // Code to benchmark
  },
}).add('Second', {
  setup: function() {
    // My init code
  },
  fn: function() {
    // Code to benchmark
  },
}));

However, the suite that overwrites the globals consistently reports one of it's benchmarks as being faster. The other shows only small differences. The below output is an example.

Benchmark:
  First x 141,861 ops/sec ±0.78% (80 runs sampled)
  Second x 132,949 ops/sec ±1.54% (83 runs sampled)
  First is 8% faster.

Other benchmark:
  First x 171,566 ops/sec ±0.87% (72 runs sampled)
  Second x 173,777 ops/sec ±0.74% (83 runs sampled)
  Second is 1% faster.

Am I doing something wrong here?

bestiejs / benchmark.js

Problem with getting reliable results #88