cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 557 forks source link

[Docs][Bug?] How to combine multitask GP regression with uncertain inputs? #2397

Closed famura closed 1 year ago

famura commented 1 year ago

📚 Documentation/Examples AND/OR :bug: Bug

Is there documentation missing? I could not find out how to make multitask GPs with uncertain inputs work. I started by combining their examples on multitask GP regression and GP regression with uncertain inputs, see the code below. I am not exactly sure what the root cause of my error is. The crash occurs due to a shape mismatch at line 216 of variational_stategy.py

Is documentation wrong? No

Is there a feature that needs some example code? No

Think you know how to fix the docs? (If so, we'd love a pull request from you!) No, but I think if we fixed the code I put below, we could enrich the doc with another example.


"""
The goal of this example is to combine multitask GP regression (see here https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html)
with uncertain inputs to the GP (see here https://docs.gpytorch.ai/en/latest/examples/04_Variational_and_Approximate_GPs/GP_Regression_with_Uncertain_Inputs.html)

I chose the data dim to be 2.
"""

import gpytorch
import torch
from gpytorch.models import ApproximateGP
from gpytorch.variational import CholeskyVariationalDistribution, VariationalStrategy
from matplotlib import pyplot as plt

# Create some inputs and targets
n_train = 100
train_x_mean = torch.stack([torch.linspace(0, 1.5, n_train), torch.linspace(0.5, 2, n_train)], 0)
train_x_cov = torch.diag(torch.linspace(0.03, 0.01, n_train))
train_y = torch.stack(
    [
        torch.sin(torch.linspace(0, 2, n_train) * (2 * torch.pi)) + torch.randn(n_train) * 0.2,
        torch.cos(torch.linspace(0, 2, n_train) * (2 * torch.pi)) + torch.randn(n_train) * 0.2,
    ],
    -1,
)

class ApproximateMultitaskGPModel(ApproximateGP):
    def __init__(self, inducing_points):
        variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))
        variational_strategy = VariationalStrategy(
            self, inducing_points, variational_distribution, learn_inducing_locations=True
        )
        super().__init__(variational_strategy)

        self.mean_module = gpytorch.means.MultitaskMean(gpytorch.means.ConstantMean(), num_tasks=2)
        self.covar_module = gpytorch.kernels.MultitaskKernel(gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)  # also tried MultitaskMultivariateNormal

likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=2)
model = ApproximateMultitaskGPModel(inducing_points=torch.randn(10, 2) + 1)

# Find optimal model hyperparameters
training_iterations = 50
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(
    [
        {"params": model.parameters()},
        {"params": likelihood.parameters()},
    ],
    lr=0.01,
)

# Our loss object. We're using the VariationalELBO
mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=train_y.size(0))

for i in range(training_iterations):
    # First thing: draw a sample set of features from our distribution
    train_x_sample = torch.distributions.MultivariateNormal(train_x_mean, train_x_cov).rsample()  # of shape (2,100)
    train_x_sample = train_x_sample.T  # of shape (100,2)

    # Now do the rest of the training loop
    optimizer.zero_grad()
    output = model(train_x_sample)  # BREAKS HERE (also when not transposing before)
    loss = -mll(output, train_y)
    loss.backward()
    print("Iter %d/%d - Loss: %.3f" % (i + 1, training_iterations, loss.item()))
    optimizer.step()

# Set into eval mode
model.eval()
likelihood.eval()

# Initialize plots
f, (y1_ax, y2_ax) = plt.subplots(1, 2, figsize=(8, 3))

# Make predictions
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.cat([torch.linspace(0, 2, 51), torch.linspace(0, 2, 51)])

    predictions = likelihood(model(test_x))
    mean = predictions.mean
    lower, upper = predictions.confidence_region()
famura commented 1 year ago

After a bit more digging, I am starting to believe that this can not work with the existing code base. So this code snippet from variational_strategy.py

# Covariance terms
num_induc = inducing_points.size(-2)
test_mean = full_output.mean[..., num_induc:]
induc_induc_covar = full_covar[..., :num_induc, :num_induc].add_jitter(self.jitter_val)
induc_data_covar = full_covar[..., :num_induc, num_induc:].to_dense()
data_data_covar = full_covar[..., num_induc:, num_induc:]

Works for the example in the doc since there is only one task, i.e., the shape of the y tensor is (N,) with N being the number of training points. Thus, the covar matrix is of shape (N+M, N+M) with M being the number of inducing points. This makes the slicing above meaningful.

However, if the task dim T is > 1, the shape of the covar matrix is (N*T, N*T) in my other examples and ((N+M)*T, (N+M)*T) here. Please correct me if I am wrong about this. Consequently, one would need to slice the full_covar differently, i.e., remove num_induc for every task.

I am not sure how to proceed now. Unsqueeze to 3 dimensions and "misuse" the batch dim as task dim? But that would remove the correlations between the tasks, right?

Help is very much appreciated. @gpleiss @Balandat

famura commented 1 year ago

Update: I have no idea how I overlooked the IndependentMultitaskVariationalStrategy class. I updated the associated part of the code snipped above to

class ApproximateMultitaskGPModel(ApproximateGP):
    def __init__(self, inducing_points):
        assert inducing_points.ndim == 2
        variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))
        base_variational_strategy = VariationalStrategy(
            self, inducing_points, variational_distribution, learn_inducing_locations=True
        )
        variational_strategy = IndependentMultitaskVariationalStrategy(
            base_variational_strategy, num_tasks=2
        )
        super().__init__(variational_strategy)

        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=2
        )
        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1
        )

Unfortunately, I still get an error at line 216 of variational_strategy.py. However now, we first run through this line of the IndependentMultitaskVariationalStrategy class. So to me it looks like the multitask-wrapping here has essentially no effect.

Furthermore, I am not sure about the inducing_values which is a tensor of shape (M,) when crashing. Shouldn't it have two dimensions, e.g., (M, T) with T being the number of task dims?

Any ideas?

famura commented 1 year ago

I am sorry for the confusion that my issue might have caused. I don't know why it took me so long to find the right tutorial in the docs.

It is precisely described here