You need to make sure your CPU is running at full clock speed when starting the test. Otherwise, part of the test is a scaling of the CPU frequency, which isn't fair.
If you run the test 20 times for each application and measure all the results, what are the variance of the measurements? If you use something like Haskell's criterion or my own eministat to analyze the results, are they stable against outlier variance?
You need to make sure your CPU is running at full clock speed when starting the test. Otherwise, part of the test is a scaling of the CPU frequency, which isn't fair.
If you run the test 20 times for each application and measure all the results, what are the variance of the measurements? If you use something like Haskell's criterion or my own eministat to analyze the results, are they stable against outlier variance?