Open AndreaBraschi opened 2 years ago
Could you share the code?
Immediately, I don't think that you can compute a KL divergence between two sets of functions but that may not be what you're doing here.
https://gist.github.com/AndreaBraschi/ece83d389d2a33662c0687300676f009#file-analyse_traj-py
see if you manage to access it.
Many thanks
The data that I'm working on are time and trajectories. Both are torch tensors as you'll see in the code.
trajectories are the data in the observed space and are time-series. length = 98 and dimensions = 59
The data for the prior over X is simply time. time is a one-dimensional tensor of length = 98
import pandas as pnd
import torch
import gpytorch
# Data
xlsx_file = pnd.read_excel('_states.xlsx', skiprows=6)
Data = np.array(xlsx_file)
shape = np.array(Data.shape)
trajectories = Data[:, 1:shape[1]:2]
trajectories = torch.Tensor(trajectories)
time = torch.Tensor(Data[:, 0])
# GPLVM Learning
# -------
from gpytorch.models.gplvm.latent_variable import *
from gpytorch.models.gplvm.bayesian_gplvm import BayesianGPLVM
from gpytorch.means import ZeroMean
from gpytorch.mlls import VariationalELBO
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.variational import VariationalStrategy
from gpytorch.variational import CholeskyVariationalDistribution
from gpytorch.kernels import ScaleKernel, RBFKernel
from gpytorch.distributions import MultivariateNormal
# Definining a Multivariate Normal distribution with time-data
mean_t_module = gpytorch.means.ConstantMean()
covar_t_module = ScaleKernel(RBFKernel())
mean_t = mean_t_module(time)
covar_t = covar_t_module(time)
Distribution = MultivariateNormal(mean_t, covar_t)
# create a class for the Bayesian GPDS
class bGPDS(BayesianGPLVM):
def __init__(self, n, data_dim, latent_dim, n_inducing, X_prior):
self.n = n
self.batch_shape = torch.Size([data_dim])
self.inducing_inputs = torch.randn(data_dim, n_inducing, latent_dim)
# Define Variational Distribution for f and u
q_u = CholeskyVariationalDistribution(n_inducing, batch_shape=self.batch_shape)
q_f = VariationalStrategy(self, self.inducing_inputs, q_u, learn_inducing_locations=True)
# Initialise X with PCA or randn
X_init = torch.nn.Parameter(torch.randn(n, latent_dim))
X = VariationalLatentVariable(n, data_dim, latent_dim, X_init, X_prior)
super().__init__(X, q_f)
# Kernel (acting on latent dimensions)
self.mean_module = ZeroMean(ard_num_dims=latent_dim)
self.covar_module = ScaleKernel(RBFKernel(ard_num_dims=latent_dim))
def forward(self, X):
mean_x = self.mean_module(X)
covar_x = self.covar_module(X)
dist = MultivariateNormal(mean_x, covar_x)
return dist
def _get_batch_idx(self, batch_size):
valid_indices = np.arange(self.n)
batch_indices = np.random.choice(valid_indices, size=batch_size, replace=False)
return np.sort(batch_indices)
N = len(trajectories)
data_dim = trajectories.shape[1]
latent_dim = data_dim - 1
n_inducing = 25
# Model
Model = bGPDS(N, data_dim, latent_dim, n_inducing, Distribution)
# Likelihood
Likelihood = GaussianLikelihood(batch_shape=Model.batch_shape)
# Declaring the objective to be optimised
mll = VariationalELBO(Likelihood, Model, num_data=len(trajectories))
# optimiser
optimizer = torch.optim.Adam([
{'params': Model.parameters()},
{'params': Likelihood.parameters()}
], lr=0.01)
loss_list = []
training_iter = 100
batch_size = 60
for i in range(training_iter):
batch_index = Model._get_batch_idx(batch_size)
optimizer.zero_grad()
sample = Model.sample_latent_variable() # a full sample returns latent x across all N
sample_batch = sample[batch_index]
output = Model(sample)
output_batch = Model(sample_batch)
trajectories_batch = trajectories[batch_index].T
loss = -mll(output, trajectories.T).sum()
loss_list.append(loss.item())
loss.backward()
optimizer.step() ```
So, following up from our conversation over gist.
It looks like the issue is that the VariationalLatentVariable
is producing a variational distribution that is independent across the latents hence why it's generating a q_x distribution that is a Normal rather than a MultivariateNormal
. Hence, a KL not implemented error.
I don't know enough about the model types to know if it's reasonable to just tell you to sub-class the VariationalLatentVariable
and have its forward method generate a MultivariateNormal
instead and want to tag people who might know better for your specific problem.
Flagging @vr308 and @gpleiss who wrote the code for VariationalLatentVariable
since I don't have that much experience here. Is it reasonable to just put up a sub-class of VariationalLatentVariable
that returns a full MultivariateNormal here instead of the Normal
that is constructed.
Hi,
Yes, the right thing would be to write a new Latent Variable class with the necessary correlation structure. I am including an example below for a latent variable correlated across dimensions (d) so each q(x) is a MultivariateNormal.
class VariationalDenseLatentVariable(LatentVariable):
def __init__(self, X_init, prior_x, data_dim):
n, latent_dim = X_init.shape
super().__init__(n, latent_dim)
self.data_dim = data_dim
self.prior_x = prior_x
# Local variational params per latent point with dimensionality latent_dim
self.q_mu = torch.nn.Parameter(X_init) # (.cuda()))
self.q_log_sigma = torch.nn.Parameter(torch.randn(n, latent_dim**2)) # .cuda()
jitter = torch.eye(latent_dim).unsqueeze(0)*1e-5
if torch.cuda.is_available():
self.q_mu = self.q_mu.cuda()
self.q_log_sigma = self.q_log_sigma.cuda()
self.jitter = torch.cat([jitter for i in range(n)], axis=0).cuda()
# This will add the KL divergence KL(q(X) || p(X)) to the loss
self.register_added_loss_term("x_kl")
#jitter = torch.eye(latent_dim).unsqueeze(0)*1e-5
#self.jitter = torch.cat([jitter for i in range(n)], axis=0)
def sigma(self):
sg = self.q_log_sigma
sg = sg.reshape(len(sg), self.latent_dim, self.latent_dim)
sg = torch.einsum('aij,akj->aik', sg, sg)
sg += self.jitter
return sg
def forward(self, batch_idx=None):
self.q_sigma = self.sigma()
if batch_idx is None:
batch_idx = np.arange(self.n)
q_mu_batch = self.q_mu[batch_idx, ...]
q_sigma_batch = self.q_sigma[batch_idx, ...]
q_x = torch.distributions.MultivariateNormal(q_mu_batch, q_sigma_batch)
self.prior_x.loc = self.prior_x.loc[:len(batch_idx), ...]
self.prior_x.scale = self.prior_x.covariance_matrix[:len(batch_idx), ...]
x_kl = kl_gaussian_loss_term(q_x, self.prior_x, len(batch_idx), self.data_dim)
self.update_added_loss_term('x_kl', x_kl)
return q_x.rsample()
@wjmaddox many thanks for helping out raising this issue of mine
@vr308 many thanks for providing an example!! Will definitely give it a go and let you know if it works.
@vr308
Hi,
I have been trying to implement the example that you provided.
I get an error because the size of q_x (variational distribution over the latent variables) does not match that of p_x (prior over X) when the KL divergence is to be minimised. The error is due to the different sizes of the Covariance Matrix. Specifically:
The exact error is the following:
"RuntimeError: The expanded size of the tensor (30) must match the existing size (98) at non-singleton dimension 2. Target sizes: [98, 30, 30]. Tensor sizes: [98, 98]"
May I ask you why you constructed q_sigma (covariance matrix of q_x) such that the resulting Tensor had the size that I previously mentioned?
Many thanks in advance for your time.
Hi all,
I’ve been trying to implement a variational GP dynamical system (GPDS) using GPyTorch.
At this very first stage of my project, I haven't changed much from the GPLVM example provided in the GPyTorch documentation. The only thing I'm changing is the prior over X. In the GPLVM example, a Normal Prior is used, whereas I'm trying to define a Multivariate Normal prior with mean = 0 and covariance matrix that is constructed upon the time-data, using, at the moment, an RBF Kernel.
The model is defined without problems. My issue comes when I compute the loss as the negation of the Variational ELBO between the output of the latent function and the targets. Specifically, the error that I get is NotImplementedError in the kl_divergence which should be raised "if the distribution types have not been registered via register_kl". The output of the latent function is a Multivariate Normal distribution and the target is a torch Tensor, as required by the function.
I've tried different things to solve this problem, but I don't want to list them all and make the post excessively long. One thing I believe is worth saying is that at the moment the Multivariate Normal Prior is defined outside my GPDS class and is used within the GPDS class as an input. I've tried to define it within the class but I got the same error described above.
Any suggestion of what I might be doing wrong will be extremely appreciated. Willing to share my code as well if the error doesn't ring a bell to anyone.
Cheers,
Andrea Braschi