🐛 Bug

The variance goes to zero when trying to trying to use the sufficient statistics from one GP in another. This may also be an implementation issue on my part.

To reproduce

import torch

from botorch.models.transforms import Normalize, Standardize
from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.fit import fit_gpytorch_mll
from gpytorch.kernels import MaternKernel, ScaleKernel, Kernel

import math

f = lambda x: torch.sin(math.pi * x) + 0.1
# Problem set up
d = 1
bounds = torch.tensor([[-4.0], [4.0]])
train_X1 = torch.linspace(bounds[0].item(), bounds[1].item(), 20).unsqueeze(-1).to(torch.float64)
train_Y1 = f(train_X1)
train_Y1 += 0.1 * torch.randn_like(train_Y1)
# Build first GP model
gp1 = SingleTaskGP(train_X1, train_Y1, input_transform=Normalize(d,bounds=bounds), outcome_transform=Standardize(m=1))
mll = ExactMarginalLogLikelihood(gp1.likelihood, gp1)
fit_gpytorch_mll(mll)

# Build second GP model using the mean module and covariance module from the first GP.
class MyKernel(Kernel):
    has_lengthscale = True
    is_stationary = True
    def __init__(self, gp1, **kwargs):
        super(MyKernel, self).__init__(**kwargs)
        self.gp1 = gp1
        self.scaled_matern_kernel = ScaleKernel(MaternKernel())

    def forward(self, x1, x2, **params):
        A = self.scaled_matern_kernel(x1, x2)
        # Compute standard deviatio, may need to turn off gradients for this calculation
        sigma = self.gp1.posterior(x1).variance.sqrt().detach()
        sigma_ = self.gp1.posterior(x2).variance.sqrt().detach()
        B = sigma @ sigma_.T
        assert A.shape == B.shape, f"Shapes do not match: {A.shape} and {B.shape}"
        return A + B

train_X2 = torch.tensor([-2.,0.,1.]).unsqueeze(-1).to(torch.float64)
train_Y2 = f(train_X2)
train_Y2 += 0.1 * torch.randn_like(train_Y2)
gp2 = SingleTaskGP(train_X2, train_Y2,
                   outcome_transform=Standardize(m=1),
                   input_transform=Normalize(d=train_X2.shape[1],bounds=bounds),
                   mean_module=gp1.mean_module,
                   covar_module=MyKernel(gp1))
mll = ExactMarginalLogLikelihood(gp2.likelihood, gp2)
fit_gpytorch_mll(mll)

# Visualisation shows the unexpected behaviour
def plot_gp(gp, x_plot, color, label):
    p = gp.posterior(x_plot)
    m = p.mean.squeeze().detach()
    s = p.variance.sqrt().squeeze().detach()

    plt.plot(x_plot, m, color, label=label)
    plt.fill_between(x_plot.squeeze(), m + s, m - s, color=color, alpha=0.3)

eval_X = torch.linspace(bounds[0].item(), bounds[1].item(), 200).unsqueeze(-1)
plt.figure(figsize=(8, 4))
plt.scatter(train_X1.squeeze().tolist(), train_Y1.squeeze().tolist(), color='k', marker='x', label = 'Train data for GP1')
plt.scatter(train_X2.squeeze().tolist(), train_Y2.squeeze().tolist(), color='r', marker='o', label = 'Train data for GP2')
plot_gp(gp1, eval_X, "C0", label='GP1')
plot_gp(gp2, eval_X, "g", label='GP2')
plt.legend()

The variance has gone to zero

Not an error message but getting

NumericalWarning:Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-10.

Expected Behavior

The variance should be reduced in areas where there is support from the first GP, combining it with the covariance calculation from the second, thus leveraging both model's covariance calculations.

I wonder if it is something I have done wrong w.r.t. turning off the gradient calculations for the second GP w.r.t. to the first in the forward method, where sigma has already been calculated and perhaps the kernel is still trying to use gradients there.

System information

Please complete the following information:

GPyTorch Version: 1.11
PyTorch Version: 2.2.0.post100
Mac OS Sonoma 14.4

cornellius-gp / gpytorch

Nesting GPs; using the sufficient statistics from one GP as sufficient stats in another GP - variance goes to zero #2508

🐛 Bug

To reproduce

Expected Behavior

System information