Add useful summary statistics to the webapp

GoogleCodeExporter commented 9 years ago

There are two issues with variance computation in Caliper.

1. Incorrect normalization. This is sample variance, so it should be divided by 
(n-1) instead of (n). When n is small, such as 3, this makes a huge difference!
This may in turn lead to incorrect short circuits!

2. It suffers from catastrophic cancellation in the computation. The formula

(sumOfSquaresOfLastN / size()) - squared(mean())

will return incorrect results when the mean is signficantly different from 0, 
and the standard deviation is significantly smaller than the mean (which should 
be the case for reasonable benchmarks!).

In the worst case, you may even see a negative variance.

See this page for details, or Knuth if you prefer:

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

Since "n" is small (say, less than 1000, fits in memory easily), I actually 
recommend just doing the two-pass approach!

Here's a modified code:

---
  public double mean() {
    double sum = 0;
    for (int i = size() - 1; i >= 0; i--) {
      sum += lastN[i];
    }
    return sum / size();
  }

  public double variance() {
    final double m = mean();
    double sum = 0;
    // Note: this is numerically more stable than E(X*X)-E(X)*E(X)!
    for (int i = size() - 1; i >= 0; i--) {
      sum += squared(lastN[i] - m);
    }
    return sum / (size() - 1);
  }

---

Maybe also consider using the median instead of the mean.

Here is an example run before:

 0% Scenario{vm=java, trial=0, ...} 97520,99 ns; σ=2484,44 ns @ 10 trials
20% Scenario{vm=java, trial=1, ...} 93689,45 ns; σ=775,55 ns @ 3 trials
40% Scenario{vm=java, trial=2, ...} 91320,40 ns; σ=1171,48 ns @ 10 trials
60% Scenario{vm=java, trial=3, ...} 90497,03 ns; σ=507,98 ns @ 3 trials
80% Scenario{vm=java, trial=4, ...} 91625,80 ns; σ=2577,81 ns @ 10 trials

Note A) the large differences in standard deviation, and B) it's either 3 or 10 
trials.

Afterwards:

 0% Scenario{vm=java, trial=0, ...} 91669,36 ns; σ=502,03 ns @ 3 trials
20% Scenario{vm=java, trial=1, ...} 95936,66 ns; σ=398,25 ns @ 3 trials
40% Scenario{vm=java, trial=2, ...} 90981,60 ns; σ=834,11 ns @ 4 trials
60% Scenario{vm=java, trial=3, ...} 91487,95 ns; σ=362,09 ns @ 3 trials
80% Scenario{vm=java, trial=4, ...} 94000,82 ns; σ=920,91 ns @ 6 trials

Note how now the standard deviations are much more stable, and the 
short-circuit rule also was able to kick in at other situations than 3 and 10?

Original issue reported on code.google.com by erich.sc...@gmail.com on 16 Jan 2013 at 3:27

GoogleCodeExporter commented 9 years ago

I suggest to drop both variance and standard deviation

and use instead http://en.wikipedia.org/wiki/Absolute_deviation

Revisiting a 90-year-old debate: the advantages of the mean deviation
http://www.leeds.ac.uk/educol/documents/00003759.htm

We Don’t Quite Know What We Are Talking About When We Talk About Volatility
http://www-stat.wharton.upenn.edu/~steele/Courses/434/434Context/Volatility/Conf
usedVolatility.pdf

Original comment by Andrei.Pozolotin on 16 Jan 2013 at 4:34

GoogleCodeExporter commented 9 years ago

Thanks very much, both!  We want to get to the point that all our math is on a 
very solid footing.  I'm surprised this code you see lasted as long as it has.

Part of the reason is that we've been working on the full rewrite for the last 
2 years. That at least is pushed out to head now so you can look at it (for 
now, all the code that lives in various *subpackages* of com.google.caliper is 
the rewrite code, and code in com.google.caliper directly is for the chopping 
block. See https://code.google.com/p/caliper/wiki/UnderstandingTheCodebase for 
a little more).

In the rewrite codebase, I'm not sure we are even using variance/stddev of any 
kind in any way. Except for a badly-coded straw-man short-circuiting feature 
that I want to rip out and possibly rethink from the beginning (or just don't 
have it).  We don't show it in the results anymore; it was unclear what value 
there really was in that... visually inspecting the box plots seems so much 
better.

Interested in all your thoughts.

Original comment by kevinb@google.com on 17 Jan 2013 at 3:52

GoogleCodeExporter commented 9 years ago

why don't you kill obsolete code now and push clean snapshot out?

Original comment by Andrei.Pozolotin on 17 Jan 2013 at 11:34

GoogleCodeExporter commented 9 years ago

We have many users within Google and migrating them takes time.  Please be 
patient.

Original comment by gak@google.com on 17 Jan 2013 at 11:45

GoogleCodeExporter commented 9 years ago

Now that 1.0-beta-1 has been cut (still pending the maven push, but we're 
working on it) and the new webapp has been deployed, the web UI will be the 
focus of this type of analysis.  Since these types of details are hidden by 
default, I'm happy to add pretty much anything that makes sense.

Original comment by gak@google.com on 11 Apr 2013 at 10:40

Changed title: Add useful summary statistics to the webapp
Changed state: Accepted
Added labels: Component-WebUI, Type-Enhancement
Removed labels: Type-Defect

ccristian / caliper

Add useful summary statistics to the webapp #200