cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 557 forks source link

[Bug] unexpected input size in forward pass of Approximate GP model #2450

Closed heikestein closed 10 months ago

heikestein commented 10 months ago

🐛 Bug

I implemented an ApproximateGPModel as described in https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.html .

The inputs of my model (inducing points) are 2-D, in contrast to the example snippet above. When the forward pass is called, the number of samples changes w.r.t. the true size of inducing points (instead of a [nsamples x 2] tensor, I am told that the input is a [2*nsamples x 2] tensor). Defining and training an equivalent ExactGPModel does not produce this behavior (size remains [nsamples x 2]).

Note that this does not result in an error message, but I am insecure about where the size mismatch originates from.

To reproduce

Code snippet to reproduce

import numpy as np
import torch
import gpytorch as gpy

################ data

n_dim = 2

x = np.tile(np.linspace(-np.pi, np.pi, 100), n_dim).reshape(n_dim,-1)
x[1] = np.concatenate([np.roll(x[1], 40)[::2], np.roll(x[1], 40)[::2]])
train_x = torch.tensor(x.T)

train_y = torch.sin(train_x[:,0]) +  torch.sin(2+train_x[:,1]) + torch.randn(100)/10 

################ define approximate model

class ApproximateGPModel(gpy.models.ApproximateGP):
    def __init__(self, inducing_points):

        variational_distribution = gpy.variational.CholeskyVariationalDistribution(inducing_points.size(0))
        variational_strategy = gpy.variational.VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True)

        super().__init__(variational_strategy)

        self.mean_module = gpy.means.ConstantMean()

        self.covar_module = gpy.kernels.ScaleKernel(
                                gpy.kernels.PeriodicKernel(eps=1, active_dims=torch.tensor([0]))
                                )
        for d in range(1,inducing_points.size(-1)):
            self.covar_module += gpy.kernels.ScaleKernel(
                                    gpy.kernels.PeriodicKernel(eps=1, active_dims=torch.tensor([d]))
                                    )

    def forward(self, x):
        print('calling forward pass, x size: ', x.size())
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)

        return gpy.distributions.MultivariateNormal(mean_x, covar_x)

def train_approximate_gp(train_x, train_y):

    model = ApproximateGPModel(train_x)
    likelihood = gpy.likelihoods.GaussianLikelihood()

    mll = gpy.mlls.VariationalELBO(likelihood, model, num_data=train_y.numel())

    # fix period length to 2*pi
    for k in model.covar_module.kernels:
        k.base_kernel.raw_period_length = torch.nn.Parameter(torch.tensor([[2*np.pi]]), requires_grad=False)

    optimizer = torch.optim.Adam(list(model.parameters()) + list(likelihood.parameters()), lr=0.1)

    model.train()
    likelihood.train()

    for _ in range(100):
        output = model(train_x)
        loss = -mll(output, train_y)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    model.eval()
    likelihood.eval()

    return model, likelihood

model_approx, likelihood_approx = train_approximate_gp(train_x, train_y)

Stack trace/error message

calling forward pass, x size:  torch.Size([200, 2])

Expected Behavior

calling forward pass, x size:  torch.Size([100, 2])

System information

GPyTorch Version 1.9.1 PyTorch Version 1.13.1 Mac OS 13.5.2 (22G91)

Additional context

I am also unsure about the dimension that needs to be indicated when setting up the variational distribution: In different documentation pages, either dim 0 or dim -1 is used:

https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.html uses variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))

but

https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/Approximate_GP_Objective_Functions.html uses variational_distribution = gpytorch.variational.CholeskyVariationalDistribution(inducing_points.size(-1))

and I don't think inducing_points are transposed anywhere in the two snippets. Documentation for high-D input would be super useful here.

heikestein commented 10 months ago

I've seen that my confusion might be due to some misunderstanding of the concept of inducing points, so closing this issue until I've done the reading.