JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 413 forks source link

quantile() function for multivariate normal distribution #569

Open mikea opened 7 years ago

mikea commented 7 years ago

I'd like to discuss API addition before I send the PR.

Quantile surfaces for multivariate normal distributions are ellipsoids. I suggest to add following definition:

type Ellipsoid
    center::Vector{Float64}
    axis::Matrix{Float64}
end

and define a quantile(d::MvNormal, q) function that will return Ellipsoid instance.

mikea commented 7 years ago

Here's the draft quantile implementation:

function Base.quantile(d::MvNormal, q)
    evals, evectors = eig(cov(d))
    c = quantile(Chisq(length(d)), q)
    for i in 1:length(d)
        evectors[:,i] .*= sqrt(c * evals[i]) / norm(evectors[:,i])
    end

    Ellipsoid(mean(d), evectors)
end
andreasnoack commented 7 years ago

Which methods would you define for Ellipsoid?

mikea commented 7 years ago

At a minimum - member function that tests if a point is inside. The other useful one is surface parametrization: given t from [0,1)^(n-1) compute the surface point.

ararslan commented 7 years ago

Surface parametrization sounds like it's getting a bit outside of the scope of this package.

mikea commented 7 years ago

I personally would be content without adding any methods for Ellipsoids all. Let geometry packages worry about it. (Are there any?)

johnmyleswhite commented 7 years ago

I think I'm a little confused by the idea that quantiles are ellipses. Are these equiprobability contours that enclose a region in which draws occur with the requested probability? That seems like it's analogous to returning a two-element tuple for quantiles for univariate distributions. Is that the idea?

mikea commented 7 years ago

I think I'm a little confused by the idea that quantiles are ellipses. Are these equiprobability contours that enclose a region in which draws occur with the requested probability?

Yes. Wikipedia also calls them intervals and confidence regions. I think this is a natural generalization of univariate quantiles. Would you prefer a different name?

That seems like it's analogous to returning a two-element tuple for quantiles for univariate distributions.

I haven't seen these. https://distributionsjl.readthedocs.io/en/latest/univariate.html doesn't mention anything like it. How does it work?

oxinabox commented 5 years ago

I wanted this today. for talking about probability coverage intervals.

mschauer commented 5 years ago

I have the bivariate case in GaussianDistributions.jl

oxinabox commented 5 years ago

I all I actually wanted was the ability to calculate picp. (The potion of true points that fall within a estimated distributions coverage interval)

and that can be done via:

function picp(α::Float64, dist::AbstractMvNormal, y_trues)
    # need to calculate how many lay in the ellipoidal Confedence region
    # see https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Interval

    bound = quantile(Chisq(length(dist)), α)

    centroid = mean(dist)
    proj = inv(cov(dist))
    return mean(y_trues) do y
        offset = y .- centroid
        d = offset' * proj * offset
        d <= bound
    end
end
mschauer commented 5 years ago

Ah, good to have.