🐛 Bug

Hi, I've been using the Multitask sparse variational Gaussian process framework of Gpytorch to model the velocity on a 2D grid of (150,50) points. The training data corresponds to a time series evolution of this velocity (4800 snapshots.). After training, I've noticed that there are extreme oscillation in the training loss (Variational ELBO), and I have not been able to figure out where this is coming from. In terms of pre processing of the data I've reshaped the time series into (4800,150*50) matrices and made sure to standardize it to match the 0 mean prior assumption. The number of tasks here corresponds to the second column and is 7500.

To reproduce

Code snippet to reproduce


fig, ax = plt.subplots(1, 1, figsize=(10, 5))
losses = []
num_tasks=Y_closure_u_test_reshaped.shape[1]
class MultitaskGPModel(gpytorch.models.ApproximateGP):
    def __init__(self,num_latents,num_tasks,n_features,inducing_points_centers):
        num_tasks=Y_closure_u_test_reshaped.shape[1]
        n_features = X_reshaped.size(-1)
        num_latents=20 # 20 BEST
        #num_latents=10
        inducing_points = np.repeat(inducing_points_centers[np.newaxis, :, :], num_latents, axis=0)
        inducing_points = torch.tensor(inducing_points, dtype=torch.float)
        variational_distribution = gpytorch.variational.CholeskyVariationalDistribution(
            inducing_points.size(-2), batch_shape=torch.Size([num_latents])
        )

        # We have to wrap the VariationalStrategy in a LMCVariationalStrategy
        # so that the output will be a MultitaskMultivariateNormal rather than a batch output
        variational_strategy = gpytorch.variational.LMCVariationalStrategy(
            gpytorch.variational.VariationalStrategy(
                self, inducing_points, variational_distribution, learn_inducing_locations=True
            ),
            num_tasks=num_tasks,
            num_latents=num_latents,
            latent_dim=-1
        )

        super().__init__(variational_strategy)
        self.mean_module = gpytorch.means.ConstantMean(batch_shape=torch.Size([num_latents]))
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(batch_shape=torch.Size([num_latents])),
            batch_shape=torch.Size([num_latents])
        )        
    def forward(self, x):

        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

num_tasks=Y_closure_u_test_reshaped.shape[1] 
n_features = X_reshaped.size(-1)
num_latents=20 
num_epochs=10000

model = MultitaskGPModel(num_latents,num_tasks,n_features,inducing_points_centers).to(device)
likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=num_tasks).to(device)
model.train()
likelihood.train()
optimizer = torch.optim.Adam([
    {'params': model.parameters()},
    {'params': likelihood.parameters()},
], lr=0.01)

mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=Y_closure_u_reshaped.size(0))
losses = []
for epoch in tqdm.notebook.tqdm(range(num_epochs), desc=f"Epoch (LR={0.01})"):
    optimizer.zero_grad()
    output = model(X_reshaped)
    loss = -mll(output, Y_closure_u_reshaped)
    if loss.item()<=-11000: 
        break
    losses.append(loss.item())
    loss.backward()
    #torch.cuda.empty_cache()
    optimizer.step()

%time

## System information

**Please complete the following information:**
- <!-- GPyTorch Version-->1.11
- <!-- PyTorch Version -->1.13.1
- <!-- Computer OS --> Windows

## Additional context
I've attached the oscillation of the loss below. I don't understand why I have such huge oscillations. Anyone know what I could do to regularize the training process and overcome these huge oscillations?

cornellius-gp / gpytorch

[Bug] Extreme oscillation in loss #2483

🐛 Bug

To reproduce