Bayesian GPLVM with general Gaussian priors

alexisboukouvalas commented 8 years ago

I believe that general Gaussian priors cannot be specified in Bayesian GPLVM as currently implemented in GPy.

I am including below a script to demonstrate the issue. Note there are two issues:

The code for MultivariateGaussian and DPBayesianGPLVM appear broken and are not included in any tests or examples in the GPy code. See the commented out code below the uses these.
The approach used by the community by putting priors on the X in a BayesianGPLVM is I believe incorrect because these are not used in the KL term KL(q(X) | p(X))

I am hoping this will be easily fixed by extending NormalPrior/Posterior in the variational code to take a non-standard multivariate normal. I am not familiar with the GPy code however so please advise.

@vals you may be interested in the resolution of this issue. @mzwiessele, @zhenwendai your guidance and advice is much appreciated.

# GPLVM with non-identical Gaussian prior
import GPy
import pods
import numpy as np

# Load simple dataset
data = pods.datasets.singlecell()
genes = data['Y']  
Y = data['Y']['Id2'][:,None]
N = Y.shape[0]

# Use cell stage as prior
labels = data['labels']
stageN = np.zeros ((N,1))
for i,l in enumerate(labels):
    stageN[i] = np.log2(int(l[:2])) + 1

priormean=stageN
priorstd= 0.2*np.ones((N,1))

np.random.seed(0)

# Initialise from prior
Xinit = np.zeros((N,1)) 
for i in range(N):
    Xinit[i,0] = priormean[i,0] + priorstd[i,0]*np.random.randn(1)

# X_prior = GPy.core.parameterization.priors.MultivariateGaussian(priormean, np.diag(priorstd.flatten()))
# m = GPy.models.DPBayesianGPLVM(Y, 1, X_prior,  kernel=GPy.kern.RBF(1), X=Xinit )
# This result in error 
#ValueError: too many values to unpack
# because pdinv returns more values than expected

''' Bayesian GPLVM with MAP prior '''
X_variance = np.random.uniform(0,.1,Xinit.shape)
Z = np.random.permutation(Xinit.copy())[:50]
m = GPy.models.BayesianGPLVM(Y, 1, kernel=GPy.kern.RBF(1), X=Xinit, X_variance=X_variance, num_inducing=50,Z=Z )
print 'm likelihood with no prior specified ' + str( m.log_likelihood() )
print 'm objective with no prior specified ' + str( m.objective_function() ) # does not change
# Set prior
for i in range(N):
    m.X.mean[i, [0]].set_prior(GPy.priors.Gaussian(priormean[i,0], priorstd[i,0]), warning=False)
print 'm likelihood with prior' + str( m.log_likelihood() ) # does not change
print 'm objective with prior ' + str( m.objective_function() ) # does change

vals commented 8 years ago

Thanks for letting me know! I'll keep my eyes on this..

zhenwendai commented 8 years ago

The variational posterior mean is not supposed to have any prior distributions, as they are variational parameters.

I am not sure whether I get what you are planning to do. I guess you want to replace the unit Gaussian prior for p(x) to a general multivariate Gaussian, i.e. N(m, \Sigma). Will you do a point estimate for the mean and variance of the prior distribution? Will you also extend the variational posterior of X (q(X)) with full covariance matrices?

alexisboukouvalas commented 8 years ago

Thank you @zhenwendai for the quick reply! For what myself and @vals are doing, fixing the prior p(x) would be reasonable, that is we do not need to estimate the prior - it is given just as in the standard Bayesian GPLVM (where is fixed to N(0,I)).

Now the immediate use case we have is to put an independent prior for each data point xi, that is p(x) = \prod{n=1}^N N(x_i | m_i, S_i^2) Where m_i and s_i are Q and Q X Q dimensional respectively where Q the dimension of the latent space.

Would this be as simple as implementing the KL for multivariate normals? http://stats.stackexchange.com/questions/60680/kl-divergence-between-two-multivariate-gaussians

Now for your second point, do we want full covariances in p(x) and q(x), that is drop the independence assumptions between datapoints. I guess in a general library like GPy this would make sense but I don't currently have a use case for this. I am not even sure it makes sense to assume the q(X) does not factorize - please advise.

adamian commented 7 years ago

Hi @alexisboukouvalas , I see the issue is still open so let me ask, are you looking for something like this: https://arxiv.org/abs/1107.4985 ?

alexisboukouvalas commented 7 years ago

Thanks for taking a look Andreas. Actually what is implemented in the Delorean paper looks like a simpler version of your GP dynamical system paper. Specifically if you see Delorean equation (5), an informative prior is placed on the one-dimensional latent space. In the context of standard Bayesian GPLVM, I believe this can be achieved by only changing the KL term in the bound substituting the standard normal prior p(X)=N(0,1), with an informative one. This looks related to Equation 14 in your paper. But then you go on and look at what happens when q(X) is not fully factorised which I haven't considered.

Is the dynamical model described implemented in GPy or somewhere else? Can we pass in informative priors?

adamian commented 6 years ago

The dynamical model is not implemented in GPy, but it's implemented in MATLAB (https://github.com/SheffieldML/vargplvm). Also check: https://github.com/SheffieldML/GPy/issues/204

ekalosak commented 1 year ago

Stale.

SheffieldML / GPy

Bayesian GPLVM with general Gaussian priors #399