Open rupertmillard opened 5 years ago
Doesn't this (slightly) increase the risk of an overflow error? (The tradeoff may still be worth it; I'm not sure.)
No, not at all.
When the variance was stored, at the beginning of the algorithm for inclusion of another sample, q = variance * size was calculated, then at the end of that, variance = q / size was calculated. Now q is stored and only divided by size to calculate variance when you want the variance.
Ah, I missed the calculation of prevq
. Objection withdrawn!
(Hi - this is the first time I've used git or github!) Instead of storing variance, I suggest storing Q = variance * size This improves speed of add() by ~2% and merge() by ~12%, without affecting the numerical stability of this implementation Of course, it shall reduce the speed of variance() slightly, but even if you calculate variance after every insertion, we're saving some multiplies over the previous implementation