Leave division for variance calculation until it's required

BurntSushi / rust-stats

Basic statistical functions on streams for Rust.

The Unlicense

87 stars 18 forks source link

Leave division for variance calculation until it's required #11

Open rupertmillard opened 5 years ago

rupertmillard commented 5 years ago

(Hi - this is the first time I've used git or github!) Instead of storing variance, I suggest storing Q = variance * size This improves speed of add() by ~2% and merge() by ~12%, without affecting the numerical stability of this implementation Of course, it shall reduce the speed of variance() slightly, but even if you calculate variance after every insertion, we're saving some multiplies over the previous implementation

BatmanAoD commented 5 years ago

Doesn't this (slightly) increase the risk of an overflow error? (The tradeoff may still be worth it; I'm not sure.)

rupertmillard commented 5 years ago

No, not at all.

When the variance was stored, at the beginning of the algorithm for inclusion of another sample, q = variance * size was calculated, then at the end of that, variance = q / size was calculated. Now q is stored and only divided by size to calculate variance when you want the variance.

BatmanAoD commented 5 years ago

Ah, I missed the calculation of prevq. Objection withdrawn!