Open carstenbauer opened 5 years ago
var(Iterators.flatten([x1, x2]))
already works, so I'm not sure we need to add anything. One of Julia's strengths is that features can easily be combined that way.
Your approach is much slower though,
julia> x1 = rand(1_000_000);
julia> x2 = rand(1_000_000);
julia> @btime var(Iterators.flatten([$x1, $x2]));
12.272 ms (2 allocations: 112 bytes)
julia> @btime combined_mean_and_var($x1, $x2);
2.066 ms (0 allocations: 0 bytes)
More importantly, using combined_mean_var(ns, μs, vars)
one can calculate the combined variance from the lengths, means, and variances alone (one doesn't need the full time series). I don't see a simple replacement for that.
If we really care about performance we could provide a custom method for var(::Iterators.Flatten)
. There's also CatViews.jl. (Maybe flatten
could get faster too.)
The method taking only the summary statistics is a completely different beast. I'm not sure about that since we don't include any function like that currently AFAICT. Are there other examples of this kind of thing in stats? Do other programs support this (and how)?
FYI, I added an implementation of covariance matrix pooling in HypothesisTests, as it's used for a couple of multivariate tests. See https://github.com/JuliaStats/HypothesisTests.jl/blob/master/src/common.jl#L61-L67.
I couldn't find a function that calculates the combined/pooled variance of two (or more) datasets. I think it would be great to offer this.
Given two samples
x1
,x2
the combined variance is the variance of the concatenated samplevcat(x1,x2)
. I came up with the following implementation:and some tests
Do you guys think it'd be worth adding something like this?