Open ORBAT opened 9 years ago
I wasn't familiar with Bessel's correction, though reading the wikipedia article I saw:
How do you suggest we handle this? By adding additional methods for subsets, or perhaps by creating a subset-only version of this library?
Yeah, you only need to apply the correction if you're dealing with a sample out of a larger population and you don't know the mean.
One of the caveats is that Bessel's correction will give you an unbiased variance when you have samples, but it won't give you an unbiased standard deviation: there is no general method for calculating an unbiased sd in the first place. It does, however, correct some of the bias. There's also the question of which correction factor to use, but n-1 is good enough for most cases (and if someone needs something more sophisticated, it'll probably fall out of scope for stats-lite anyhow.)
A simple, backwards-compatible way of implementing this could be to have variance
and stdev
take an optional parameter sample
(or bessel
or whatever):
// Variance = average squared deviation from mean.
// If sample is true, vals represents a sample of a population, so Bessel's correction will be applied
function variance(vals, sample) {
vals = numbers(vals)
var avg = mean(vals)
var diffs = []
for (var i = 0; i < vals.length; i++) {
diffs.push(Math.pow((vals[i] - avg), 2))
}
var res = mean(diffs);
if(sample) {
res *= vals.length / (vals.length - 1);
}
return res;
}
// Standard Deviation = sqrt of variance.
// If sample is true, vals represents a sample of a population, so Bessel's correction will be applied
function stdev(vals, sample) {
return Math.sqrt(variance(vals, sample))
}
Usually not a huge fan of polymorphic functions in Node where optimization matters due to the way V8 deoptimizes them.
That said I don't know how much of a concern it is in this case because in the same application the code would have to call it like variance(vals)
and variance(vals, true)
to cause a deopt. I don't know how likely that is to happen, and then that user could avoid the penalties by calling variance(vals, false)
in the first case...
Will think about it.
In other news I just published v2.0.0 of this module with support for multi-modal mode
distributions, but at the same time made it Node.js v4.0.0+ (for ES6 Sets) so that might impact your ability to immediately use a modified variance
function.
The current method of calculating variance (and, by extension, standard deviation) is intended for sets that form the whole population. When dealing with a sample, i.e. you pick n elements out of k and you don't know the mean of the whole population, you need to apply Bessel's correction and divide by n-1 instead of n when taking the mean.