[Question] Variational Gaussian Process Dynamical System

AndreaBraschi commented 2 years ago

Hi all,

I’ve been trying to implement a variational GP dynamical system (GPDS) using GPyTorch.

At this very first stage of my project, I haven't changed much from the GPLVM example provided in the GPyTorch documentation. The only thing I'm changing is the prior over X. In the GPLVM example, a Normal Prior is used, whereas I'm trying to define a Multivariate Normal prior with mean = 0 and covariance matrix that is constructed upon the time-data, using, at the moment, an RBF Kernel.

The model is defined without problems. My issue comes when I compute the loss as the negation of the Variational ELBO between the output of the latent function and the targets. Specifically, the error that I get is NotImplementedError in the kl_divergence which should be raised "if the distribution types have not been registered via register_kl". The output of the latent function is a Multivariate Normal distribution and the target is a torch Tensor, as required by the function.

I've tried different things to solve this problem, but I don't want to list them all and make the post excessively long. One thing I believe is worth saying is that at the moment the Multivariate Normal Prior is defined outside my GPDS class and is used within the GPDS class as an input. I've tried to define it within the class but I got the same error described above.

Any suggestion of what I might be doing wrong will be extremely appreciated. Willing to share my code as well if the error doesn't ring a bell to anyone.

Cheers,

Andrea Braschi

wjmaddox commented 2 years ago

Could you share the code?

Immediately, I don't think that you can compute a KL divergence between two sets of functions but that may not be what you're doing here.

AndreaBraschi commented 2 years ago

https://gist.github.com/AndreaBraschi/ece83d389d2a33662c0687300676f009#file-analyse_traj-py

see if you manage to access it.

Many thanks

AndreaBraschi commented 2 years ago

The data that I'm working on are time and trajectories. Both are torch tensors as you'll see in the code.

trajectories are the data in the observed space and are time-series. length = 98 and dimensions = 59

The data for the prior over X is simply time. time is a one-dimensional tensor of length = 98


import pandas as pnd
import torch
import gpytorch

# Data
xlsx_file = pnd.read_excel('_states.xlsx', skiprows=6)

Data = np.array(xlsx_file)
shape = np.array(Data.shape)
trajectories = Data[:, 1:shape[1]:2]
trajectories = torch.Tensor(trajectories)
time = torch.Tensor(Data[:, 0])

# GPLVM Learning
# -------
from gpytorch.models.gplvm.latent_variable import *
from gpytorch.models.gplvm.bayesian_gplvm import BayesianGPLVM
from gpytorch.means import ZeroMean
from gpytorch.mlls import VariationalELBO
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.variational import VariationalStrategy
from gpytorch.variational import CholeskyVariationalDistribution
from gpytorch.kernels import ScaleKernel, RBFKernel
from gpytorch.distributions import MultivariateNormal

# Definining a Multivariate Normal distribution with time-data
mean_t_module = gpytorch.means.ConstantMean()
covar_t_module = ScaleKernel(RBFKernel())
mean_t = mean_t_module(time)
covar_t = covar_t_module(time)
Distribution = MultivariateNormal(mean_t, covar_t)

# create a class for the Bayesian GPDS
class bGPDS(BayesianGPLVM):
    def __init__(self, n, data_dim, latent_dim, n_inducing, X_prior):
        self.n = n
        self.batch_shape = torch.Size([data_dim])
        self.inducing_inputs = torch.randn(data_dim, n_inducing, latent_dim)

        # Define Variational Distribution for f and u
        q_u = CholeskyVariationalDistribution(n_inducing, batch_shape=self.batch_shape)
        q_f = VariationalStrategy(self, self.inducing_inputs, q_u, learn_inducing_locations=True)

        # Initialise X with PCA or randn
        X_init = torch.nn.Parameter(torch.randn(n, latent_dim))

        X = VariationalLatentVariable(n, data_dim, latent_dim, X_init, X_prior)
        super().__init__(X, q_f)

        # Kernel (acting on latent dimensions)
        self.mean_module = ZeroMean(ard_num_dims=latent_dim)
        self.covar_module = ScaleKernel(RBFKernel(ard_num_dims=latent_dim))

    def forward(self, X):
        mean_x = self.mean_module(X)
        covar_x = self.covar_module(X)
        dist = MultivariateNormal(mean_x, covar_x)
        return dist

    def _get_batch_idx(self, batch_size):
        valid_indices = np.arange(self.n)
        batch_indices = np.random.choice(valid_indices, size=batch_size, replace=False)
        return np.sort(batch_indices)

N = len(trajectories)
data_dim = trajectories.shape[1]
latent_dim = data_dim - 1
n_inducing = 25

# Model
Model = bGPDS(N, data_dim, latent_dim, n_inducing, Distribution)

# Likelihood
Likelihood = GaussianLikelihood(batch_shape=Model.batch_shape)

# Declaring the objective to be optimised
mll = VariationalELBO(Likelihood, Model, num_data=len(trajectories))

# optimiser
optimizer = torch.optim.Adam([
    {'params': Model.parameters()},
    {'params': Likelihood.parameters()}
], lr=0.01)

loss_list = []
training_iter = 100
batch_size = 60

for i in range(training_iter):
    batch_index = Model._get_batch_idx(batch_size)
    optimizer.zero_grad()
    sample = Model.sample_latent_variable()  # a full sample returns latent x across all N
    sample_batch = sample[batch_index]
    output = Model(sample)
    output_batch = Model(sample_batch)
    trajectories_batch = trajectories[batch_index].T
    loss = -mll(output, trajectories.T).sum()
    loss_list.append(loss.item())
    loss.backward()
    optimizer.step() ```

wjmaddox commented 2 years ago

So, following up from our conversation over gist.

It looks like the issue is that the VariationalLatentVariable is producing a variational distribution that is independent across the latents hence why it's generating a q_x distribution that is a Normal rather than a MultivariateNormal. Hence, a KL not implemented error.

I don't know enough about the model types to know if it's reasonable to just tell you to sub-class the VariationalLatentVariable and have its forward method generate a MultivariateNormal instead and want to tag people who might know better for your specific problem.

Flagging @vr308 and @gpleiss who wrote the code for VariationalLatentVariable since I don't have that much experience here. Is it reasonable to just put up a sub-class of VariationalLatentVariable that returns a full MultivariateNormal here instead of the Normal that is constructed.

vr308 commented 2 years ago

Hi,

Yes, the right thing would be to write a new Latent Variable class with the necessary correlation structure. I am including an example below for a latent variable correlated across dimensions (d) so each q(x) is a MultivariateNormal.

class VariationalDenseLatentVariable(LatentVariable):

    def __init__(self, X_init, prior_x, data_dim):
        n, latent_dim = X_init.shape
        super().__init__(n, latent_dim)

        self.data_dim = data_dim
        self.prior_x = prior_x

        # Local variational params per latent point with dimensionality latent_dim
        self.q_mu = torch.nn.Parameter(X_init) # (.cuda()))
        self.q_log_sigma = torch.nn.Parameter(torch.randn(n, latent_dim**2))    # .cuda()

        jitter = torch.eye(latent_dim).unsqueeze(0)*1e-5

        if torch.cuda.is_available():

            self.q_mu = self.q_mu.cuda()
            self.q_log_sigma = self.q_log_sigma.cuda()
            self.jitter = torch.cat([jitter for i in range(n)], axis=0).cuda()

        # This will add the KL divergence KL(q(X) || p(X)) to the loss
        self.register_added_loss_term("x_kl")

        #jitter = torch.eye(latent_dim).unsqueeze(0)*1e-5
        #self.jitter = torch.cat([jitter for i in range(n)], axis=0)

    def sigma(self):
       sg = self.q_log_sigma
       sg = sg.reshape(len(sg), self.latent_dim, self.latent_dim)
       sg = torch.einsum('aij,akj->aik', sg, sg)
       sg += self.jitter
       return sg   

    def forward(self, batch_idx=None):

        self.q_sigma = self.sigma()

        if batch_idx is None:
            batch_idx = np.arange(self.n) 

        q_mu_batch = self.q_mu[batch_idx, ...]
        q_sigma_batch = self.q_sigma[batch_idx, ...]

        q_x = torch.distributions.MultivariateNormal(q_mu_batch, q_sigma_batch)

        self.prior_x.loc = self.prior_x.loc[:len(batch_idx), ...]
        self.prior_x.scale = self.prior_x.covariance_matrix[:len(batch_idx), ...]
        x_kl = kl_gaussian_loss_term(q_x, self.prior_x, len(batch_idx), self.data_dim)        
        self.update_added_loss_term('x_kl', x_kl)
        return q_x.rsample()

AndreaBraschi commented 2 years ago

@wjmaddox many thanks for helping out raising this issue of mine

@vr308 many thanks for providing an example!! Will definitely give it a go and let you know if it works.

AndreaBraschi commented 2 years ago

@vr308

Hi,

I have been trying to implement the example that you provided.

I get an error because the size of q_x (variational distribution over the latent variables) does not match that of p_x (prior over X) when the KL divergence is to be minimised. The error is due to the different sizes of the Covariance Matrix. Specifically:

q_x : covariance matrix is a Tensor size N x Q x Q, where N = length of data and Q = latent space dimensions
p_x : covariance matrix is a Tensor of N x N size

The exact error is the following:

"RuntimeError: The expanded size of the tensor (30) must match the existing size (98) at non-singleton dimension 2. Target sizes: [98, 30, 30]. Tensor sizes: [98, 98]"

May I ask you why you constructed q_sigma (covariance matrix of q_x) such that the resulting Tensor had the size that I previously mentioned?

Many thanks in advance for your time.

cornellius-gp / gpytorch

[Question] Variational Gaussian Process Dynamical System #1867