📚 Documentation/Examples

There are 3 main classes of models that extend the reach of GPs to unsupervised learning. I think we need some docs on how to use the ApproximateGP class to build the following models.

1) GPLVM (Lawrence, 2005) 2) Bayesian GPLVM (Titsias and Lawrence, 2009) -> there is an open issue here 3) GPLVM with back-constraints (now called encoders) (Candela and Lawrence, 2006)

There is some recent literature on GP-VAEs (but they resemble VAEs more than GPs in the sense that the GP is used as a prior in latent space rather than as the decoder).

Following the example in (2) I've been trying to adapt the code to a standard GPLVM (no inducing points, no priors on X). Just where X is optimised along with kernel hypers as part of the ExactGPMarginalLikelihood. But I don't think in the ExactGP framework 'X' can be a parameter. It has to be a Tensor and that's where I am hitting a wall.

I just need to specify:

Model:

ExactGP (batch of GPs)

Inference:

Maximise log prob (y| theta,X) w.r.t X and theta.

class GPLVM(ExactGP):
    def __init__(self, Y, X, latent_dim, kernel=None, likelihood=None):

        """The standard GPLVM model class which learns 
        point estimates for latent X

        :param Y (torch.Tensor): The unsupervised data set of size N x D.
        :param batch_shape (int): Batch of GP mappings.
        :param latent_dim (int): Dimensionality of latent space.
        :param kernel (gpytorch.kernel): The kernel that governs the GP mappings from
                                latent to data space. 
        :param likelihood (gpytorch.likelihoods): Gaussian likelihood for continuous targets,
                                                    Bernoulli for classification targets.

        """
        self.Y = Y
        self.batch_shape = torch.Size([Y.shape[1]])
        self.latent_dim = latent_dim     
        if likelihood is None:
           self.likelihood = GaussianLikelihood(batch_shape=self.batch_shape)
        else:
            self.likelihood = likelihood
        self.base_model = ExactGP(X, Y, likelihood)
        super(GPLVM, self).__init__(X, Y, likelihood)
        #self.X = torch.nn.Parameter(torch.zeros(len(Y),latent_dim))
        self.X = X
        self.register_parameter(name="X", parameter=self.X)
        self.mean_module = ConstantMean(batch_shape=self.batch_shape)

        if kernel is None:
            self.kernel = ScaleKernel(
                RBFKernel(ard_num_dims=latent_dim, batch_shape=self.batch_shape),
                batch_shape=self.batch_shape
            )
        else: 
            self.kernel = kernel

        def forward(self, X):
            mean_x = self.mean_module(X)
            covar_x = self.kernel(X)
            dist = MultivariateNormal(mean_x, covar_x)
            return dist

I am sure something is obviously wrong about the specification above as I can't even get around to initialising the model. I think perhaps the right way is to use the ApproximateGP class with a DeltaVariationalDistribution?

So taking a cue from here. The code below works for:

(a) Learning latent X as a model parameter with SVI ELBO and prior=None (b) MAP inference if prior is specified over X.

But what about learning X variationally using SVI ELBO ? (this has been hinted but not explicitly shown in paper section 3.3)

from gpytorch.models import ApproximateGP
from gpytorch.mlls import VariationalELBO
from gpytorch.means import ConstantMean, LinearMean
from gpytorch.kernels import ScaleKernel, RBFKernel, MaternKernel
from gpytorch.distributions import MultivariateNormal
from gpytorch.variational import DeltaVariationalDistribution
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.priors import NormalPrior
from data import *
from data.data import *
from matplotlib import pyplot as plt
import torch
from gpytorch.mlls import PredictiveLogLikelihood
from tqdm.notebook import tqdm

__all__ = ['GPLVM']

class GPLVM(ApproximateGP):
    def __init__(self, Y, latent_dim, inducing_inputs, latent_prior=None, kernel=None, likelihood=None):

        """The GPLVM model class for unsupervised learning. The current implementation 
           can learn:

        (a) Point estimates for latent X when latent_prior = None 
        (b) MAP Inference for X when latent prior = N(\mu, \sigma)

        :param Y (torch.Tensor): The unsupervised data set of size N x D.
        :param batch_shape (int): Size of the batch of GP mappings (D, one per dimension).
        :param latent_dim (int): Dimensionality of latent space.
        :param inducing_inputs: Locations Z corresponding to u, they can be randomly initialized or
                                regularly placed with shape (D x n_inducing x latent_dim)
        :param latent_prior: Can be None or Gaussian 
        :param kernel (gpytorch.kernel): The kernel that governs the GP mappings from
                                latent to data space. 
        :param likelihood (gpytorch.likelihoods): Gaussian likelihood for continuous targets,
                                                    Bernoulli for classification targets.

        """
        self.Y = Y
        self.batch_shape = torch.Size([Y.shape[1]])
        self.latent_dim = latent_dim
        self.inducing_inputs = inducing_inputs
        variational_distribution =  variational_distribution = CholeskyVariationalDistribution(inducing_inputs.size(-2), 
                                                                                               batch_shape=self.batch_shape)
        variational_strategy = VariationalStrategy(self, inducing_inputs, 
                                                   variational_distribution, learn_inducing_locations=True)

        super(GPLVM, self).__init__(variational_strategy)

        # Register X as a parameter -> what about learning X variationally?? with q(X)
        self.X = torch.nn.Parameter(torch.zeros(len(Y),latent_dim))
        self.register_parameter(name="X", parameter=self.X)

        if latent_prior is not None:
             self.register_prior('prior_X',latent_prior, 'X')
        self.mean_module = ConstantMean(ard_num_dims=latent_dim, 
                                        batch_shape=self.batch_shape)

        if kernel is None:
            self.kernel = ScaleKernel(RBFKernel(ard_num_dims=latent_dim, 
                                                batch_shape=self.batch_shape),batch_shape=self.batch_shape)
        else: 
            self.kernel = kernel

        if likelihood is None:
           self.likelihood = GaussianLikelihood(batch_shape=self.batch_shape)
        else:
            self.likelihood = likelihood

    def forward(self, X):
        mean_x = self.mean_module(X)
        covar_x = self.kernel(X)
        dist = MultivariateNormal(mean_x, covar_x)
        return dist

Usage:

   URL = "https://raw.githubusercontent.com/sods/ods/master/datasets/guo_qpcr.csv"

    df = pd.read_csv(URL, index_col=0)
    print("Data shape: {}\n{}\n".format(df.shape, "-" * 21))
    print("Data labels: {}\n{}\n".format(df.index.unique().tolist(), "-" * 86))
    print("Show a small subset of the data:")
    df.head()

    data = torch.tensor(df.values, dtype=torch.get_default_dtype())
    # we need to transpose data to correct its shape
    train_y_pyro = data.t()

    # Initialisation and processing 

    capture_time = train_y_pyro.new_tensor([int(cell_name.split(" ")[0]) for cell_name in df.index.values])

    # we scale the time into the interval [0, 1]
    time = capture_time.log2() / 6

    X_prior_mean = torch.zeros(train_y_pyro.size(1), 2)  # shape: 437 x 2
    X_prior_mean[:, 0] = time

    train_y = train_y_pyro
    n_data_dims = train_y.shape[0]
    n_latent_dims = 2

    # Declaring model with initial inducing inputs and latent prior

    inducing_inputs = torch.randn(train_y.shape[0], 50, 2)
    latent_prior = NormalPrior(X_prior_mean, torch.ones_like(X_prior_mean))

    model = GPLVM(Y = train_y.T, inducing_inputs = inducing_inputs, latent_dim = 2, 
                  latent_prior=latent_prior, likelihood=likelihood)

    # Declaring objective to be optimised along with optimiser

    mll = VariationalELBO(likelihood, model, num_data=len(Y))

    optimizer = torch.optim.Adam([
    {'params': model.parameters()},
    {'params': likelihood.parameters()},
    ], lr=0.02)

    # Get the model into train mode
    model.train()

    # Train loop 

    loss_list = []
    iterator = tqdm(range(500))
    for i in iterator:
        optimizer.zero_grad()
        output = model(model.X)
        loss = -mll(output, model.Y.T).sum()
        loss_list.append(loss.item())
        print(str(loss.item()) + ", iter no: " + str(i))
        iterator.set_postfix(loss=loss.item())
        loss.backward(retain_graph=True)
        optimizer.step()

    # Plot result

    plt.figure(figsize=(8, 6))
    colors = plt.get_cmap("tab10").colors[::-1]
    labels = df.index.unique()

    X = model.X.detach().numpy()
    for i, label in enumerate(labels):
        X_i = X[df.index == label]
        plt.scatter(0.5*X_i[:, 0], -X_i[:, 1], c=[colors[i]], label=label)

I think we need some docs on how to use the ApproximateGP class to build the following models. GPLVM (Lawrence, 2005) Bayesian GPLVM (Titsias and Lawrence, 2009) GPLVM with back-constraints (now called encoders) (Candela and Lawrence, 2006)

100% agree. @vr308 I'd be happy to work with you on these docs.

For the standard GPLVM, I think the ExactGP model that you propose seems to be the best interface, and also what makes the most sense. It shouldn't be too difficult to modify the internals of ExactGP so that train_x can be a learnable parameter.

For the Bayesian GPLVM, I think the code example you and #1041 propose is ideal. It would be good to get a tutorial notebook for this.

For back-constraints - I need to read up on this a bit more.

But what about learning X variationally using SVI ELBO ? (this has been hinted but not explicitly shown in paper section 3.3)

I think the best way to accomplish this would be using the pyro integration. Pyro would be responsible for the inference of X. Once we get a working notebook for the Bayesian GPLVM, it should be pretty straightforward to adapt that notebook to use Pyro as well.

100% agree. @vr308 I'd be happy to work with you on these docs.

Great, preparing a PR for a tutorial with the GPLVM (treating X as a parameter).

For the standard GPLVM, I think the ExactGP model that you propose seems to be the best interface, and also what makes the most sense. It shouldn't be too difficult to modify the internals of ExactGP so that train_x can be a learnable parameter.

To be honest I was just thinking about doing the standard non-Bayesian non-sparse one for completeness, actually it isn't a very practical model as one can only apply this to small / toy datasets. -> won't prioritise looking into this for now. As MAP inference coupled with inducing points gives point estimates and its more usable.

For back-constraints - I need to read up on this a bit more.

I have pure pyro code that does this, now I am trying to do it in pure gpytorch - but perhaps that's not ideal?

Separate question:

It seems like hierarchical priors don't work in gpytorch. For examlpe:

 self.X = torch.nn.Parameter(torch.zeros(Y.shape[1],latent_dim))
 self.register_parameter(name="X", parameter=self.X) # X is a parameter 
 mu = torch.nn.Parameter(torch.Tensor([1.0])) # mu is a parameter 
 self.register_parameter('mu', parameter=self.mu)
 self.register_prior('prior_X', NormalPrior(loc=mu, scale=torch.Tensor([1.0])), 'X') # mu is a learnable parameter in the prior of X

Throws an error:

ModuleAttributeError: 'NormalPrior' object has no attribute 'loc' is there anything obviously wrong about the code?

Sorry @vr308 - somehow this issue got totally buried in my notifications!

Regarding back-constraints - I think an example using pyro + gpytorch would make the most sense. Not sure why the hierarchical priors aren't working. I'll take a look now.

Yeah - I think GPyTorch doesn't support learnable priors. If self.mu is not a parameter, then everything works fine.

cornellius-gp / gpytorch

[Docs] GPLVM class of models (GPs for unsupervised learning) #1412

📚 Documentation/Examples