cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.46k stars 545 forks source link

Nesting GPs; using the sufficient statistics from one GP as sufficient stats in another GP - variance goes to zero #2508

Closed neildhir closed 2 months ago

neildhir commented 2 months ago

🐛 Bug

The variance goes to zero when trying to trying to use the sufficient statistics from one GP in another. This may also be an implementation issue on my part.

To reproduce

import torch

from botorch.models.transforms import Normalize, Standardize
from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.fit import fit_gpytorch_mll
from gpytorch.kernels import MaternKernel, ScaleKernel, Kernel

import math

f = lambda x: torch.sin(math.pi * x) + 0.1
# Problem set up
d = 1
bounds = torch.tensor([[-4.0], [4.0]])
train_X1 = torch.linspace(bounds[0].item(), bounds[1].item(), 20).unsqueeze(-1).to(torch.float64)
train_Y1 = f(train_X1)
train_Y1 += 0.1 * torch.randn_like(train_Y1)
# Build first GP model
gp1 = SingleTaskGP(train_X1, train_Y1, input_transform=Normalize(d,bounds=bounds), outcome_transform=Standardize(m=1))
mll = ExactMarginalLogLikelihood(gp1.likelihood, gp1)
fit_gpytorch_mll(mll)

# Build second GP model using the mean module and covariance module from the first GP.
class MyKernel(Kernel):
    has_lengthscale = True
    is_stationary = True
    def __init__(self, gp1, **kwargs):
        super(MyKernel, self).__init__(**kwargs)
        self.gp1 = gp1
        self.scaled_matern_kernel = ScaleKernel(MaternKernel())

    def forward(self, x1, x2, **params):
        A = self.scaled_matern_kernel(x1, x2)
        # Compute standard deviatio, may need to turn off gradients for this calculation
        sigma = self.gp1.posterior(x1).variance.sqrt().detach()
        sigma_ = self.gp1.posterior(x2).variance.sqrt().detach()
        B = sigma @ sigma_.T
        assert A.shape == B.shape, f"Shapes do not match: {A.shape} and {B.shape}"
        return A + B

train_X2 = torch.tensor([-2.,0.,1.]).unsqueeze(-1).to(torch.float64)
train_Y2 = f(train_X2)
train_Y2 += 0.1 * torch.randn_like(train_Y2)
gp2 = SingleTaskGP(train_X2, train_Y2,
                   outcome_transform=Standardize(m=1),
                   input_transform=Normalize(d=train_X2.shape[1],bounds=bounds),
                   mean_module=gp1.mean_module,
                   covar_module=MyKernel(gp1))
mll = ExactMarginalLogLikelihood(gp2.likelihood, gp2)
fit_gpytorch_mll(mll)

# Visualisation shows the unexpected behaviour
def plot_gp(gp, x_plot, color, label):
    p = gp.posterior(x_plot)
    m = p.mean.squeeze().detach()
    s = p.variance.sqrt().squeeze().detach()

    plt.plot(x_plot, m, color, label=label)
    plt.fill_between(x_plot.squeeze(), m + s, m - s, color=color, alpha=0.3)

eval_X = torch.linspace(bounds[0].item(), bounds[1].item(), 200).unsqueeze(-1)
plt.figure(figsize=(8, 4))
plt.scatter(train_X1.squeeze().tolist(), train_Y1.squeeze().tolist(), color='k', marker='x', label = 'Train data for GP1')
plt.scatter(train_X2.squeeze().tolist(), train_Y2.squeeze().tolist(), color='r', marker='o', label = 'Train data for GP2')
plot_gp(gp1, eval_X, "C0", label='GP1')
plot_gp(gp2, eval_X, "g", label='GP2')
plt.legend()

The variance has gone to zero

Screenshot 2024-04-12 at 09 27 29

Not an error message but getting

NumericalWarning:Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-10.

Expected Behavior

The variance should be reduced in areas where there is support from the first GP, combining it with the covariance calculation from the second, thus leveraging both model's covariance calculations.

I wonder if it is something I have done wrong w.r.t. turning off the gradient calculations for the second GP w.r.t. to the first in the forward method, where sigma has already been calculated and perhaps the kernel is still trying to use gradients there.

System information

Please complete the following information: