TuringLang / EllipticalSliceSampling.jl

Julia implementation of elliptical slice sampling.
https://turinglang.org/EllipticalSliceSampling.jl/
MIT License
13 stars 5 forks source link

Generalizing to Non-Gaussian Elliptical Distributions #22

Closed ParadaCarleton closed 2 years ago

ParadaCarleton commented 2 years ago

As far as I can tell, there's nothing in this algorithm that makes it impossible to use, say, a multivariate t or multivariate logistic prior; could the code be generalized to handle this?

devmotion commented 2 years ago

It's based on an invariance property of Gaussian distributions which does not hold for general elliptical distributions AFAIK (haven't checked; eq. 2 in the paper below). However, the algorithm can be generalized to more general target distributions without Gaussian prior by a construction that involves infinite mixtures of Gaussian, i.e., multivariate t distributions (introduces an additional auxiliary variable that is marginalized out, i.e., dropped from the samples): https://jmlr.org/papers/volume15/nishihara14a/nishihara14a.pdf

I'll close this issue since it seems to be mainly a duplicate of https://github.com/TuringLang/EllipticalSliceSampling.jl/issues/12.

ParadaCarleton commented 2 years ago

@devmotion I think that linearity property is a defining property of elliptical distributions, although I might be wrong. Or rather, I think that a slightly weaker condition is required than is stated by eq. 2 -- as long as the bivariate distribution is elliptical it might work?

devmotion commented 2 years ago

This property (eq. 2) is not a defining property of elliptical distributions.

If X ~ e(m, S) is a random variable distributed according to an elliptical distribution with location m and positive definite symmetric matrix S of size d x d, then for all matrices D of size c x d (c <= d) of rand c the random variable Y := DX is distributed according to Y ~ e(Dm, D'SD) (see e.g. property 1 in Owen and Rabinovitch's paper). Thus in the general form of equation 2 with X ~ e(m, S) and Nu ~ e(m, S) independently distributed we have that the law of Y := (X - m) cos(theta) + (Nu - m) sin(theta) + m is equal to the law of Z := A + B - m where A ~ e(0, cos^2(theta) S) and B ~ e(0, sin^2(theta) S). However, in general elliptical distributions are not closed under convolutions, and hence in general the law of A + B is not e(0, sin^2(theta) S + cos^2(theta) S) = e(0, S) (which would imply Y ~ e(m, S)).

More concretely, the property holds for all S and all theta if and only if the characteristic function phi_X(t) = f(t' S t) exp(i t' m) (such a function f exists as it is defining property of elliptical distributions) of random variable X ~ e(m, S) satisfies f(x) f(y) = f(x + y) for all real numbers x and y. For instance, this is the case if X is normally distributed (there we have f(x) = exp(-x/2)). More generally, we know that f(0) = 1 (since phi_X(0) = 1 and exp(i 0 m) = 1), and hence the defining property of the exponential function implies that f(x) = exp(r x) for some real number r (for normal distributions we have r = -1/2). I.e., the natural generalization of the property in eq. 2 to X ~ e(m, S) and Nu ~ e(m, S) independent rvs holds if and only if phi_X(t) = exp(r t' S t) exp(i t' m) for some real number r.

The property in eq. 2 is the crucial part of the algorithm as you can see in eq. 6 in the original paper. It is mandatory that Y = (X - m) * cos(theta) + (X - m) * sin(theta) + mu is distributed according to the desired prior e(m, S).

ParadaCarleton commented 2 years ago

Got it, thanks! I was confusing it with a different property regarding linear combinations of components from a random sample.