JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
70 stars 40 forks source link

Pairwise Summation/Reduction for `var` #103

Open ParadaCarleton opened 2 years ago

ParadaCarleton commented 2 years ago

At the moment, var does a naive sum by adding up the squared deviations from the mean. However, when var is called on a collection, we can speed it up and also reduce the floating-point error significantly by using pairwise summation with a recursive algorithm -- roughly:

mean(var(first_half), var(second_half)) + var([mean(first_half), mean(second_half)])

(Note that this would require implementing fused statistics like mean_and_var from StatsBase, or else we would have to do more than one pass -- one for mean and one for var.)

nalimilan commented 2 years ago

Interesting. Do you have references about this? One tricky part would be to compute the variance of means without storing them in a intermediate array, or the performance benefit would probably be lost.