JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 414 forks source link

:+ on summary statistics #922

Open cscherrer opened 5 years ago

cscherrer commented 5 years ago

It can be convenient to define addition on summary statistics. Here's an example for MvNormal

function Base.:+(ss1::MvNormalStats,ss2::MvNormalStats)
    tw = ss1.tw + ss2.tw
    s = ss1.s .+ ss2.s
    m = s .* inv(tw)
    s2 = ss1.s2 + ss2.s2
    MvNormalStats(s, m, s2, tw)
end

The interpretation is that we fit ss1 and ss2 on disjoint data sets, and would like to fit the union of the two sets.

I can add this one, and maybe we can gradually fill in gaps for others. Just want to be sure this approach makes sense before messing with a PR

matbesancon commented 5 years ago

+ could be ambiguous here (I always find it tricky in a distribution context to know what we are adding), would a merge_stats function work?

cscherrer commented 5 years ago

I agree that + is a mess for distributions - should it give a convolution or a mixture? It's especially weird since most distribution families aren't closed under either of these.

But for summary stats, what else could it possibly mean? Most summary stats are literally a sum, right?

If there's really ambiguity, I agree merge_stats is better. And maybe it's fine anyway; - makes sense in most cases and opens up some nice optimizations as in HLearn, but is probably unusual enough to deserve a separate package. I'll add that to my to-do list ;)

matbesancon commented 5 years ago

nice link, never saw that before. That's a point yes, plus it avoids exporting yet another function, PR welcome :)