cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.52k stars 554 forks source link

[Question] ExactGP Regression returns the same value for high dimensional input #1176

Closed Godric877 closed 4 years ago

Godric877 commented 4 years ago

🐛 Question

I modified the example at https://gpytorch.readthedocs.io/en/latest/examples/01_Exact_GPs/Simple_GP_Regression.html#Introduction to train and test on higher-dimensional data. While generating predictions, the model always returns the same output for each test element.

To reproduce

To reproduce, I am providing the modified example

Code snippet to reproduce

import math
import torch
import gpytorch
from matplotlib import pyplot as plt

# # Training data is 100 points in [0,1] inclusive regularly spaced
# train_x = torch.linspace(0, 1, 100)
# # True function is sin(2*pi*x) with Gaussian noise
# train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * math.sqrt(0.04)
train_x = torch.randn(50,35)
train_y = torch.randn(50)

# We will use the simplest form of GP model, exact inference
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iter = 2 if smoke_test else 50

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.randn(10,35)
    observed_pred = likelihood(model(test_x))

print(observed_pred.mean)
# Your code goes here
# Please make sure it does not require any external dependencies (other than PyTorch!)
# (We much prefer small snippets rather than links to existing libraries!)

Stack trace/error message

Iter 1/50 - Loss: 1.610   lengthscale: 0.693   noise: 0.693
Iter 2/50 - Loss: 1.600   lengthscale: 0.693   noise: 0.744
Iter 3/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.754
Iter 4/50 - Loss: 1.599   lengthscale: 0.693   noise: 0.739
Iter 5/50 - Loss: 1.601   lengthscale: 0.693   noise: 0.717
Iter 6/50 - Loss: 1.602   lengthscale: 0.693   noise: 0.701
Iter 7/50 - Loss: 1.600   lengthscale: 0.693   noise: 0.696
Iter 8/50 - Loss: 1.599   lengthscale: 0.693   noise: 0.700
Iter 9/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.709
Iter 10/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.719
Iter 11/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.727
Iter 12/50 - Loss: 1.599   lengthscale: 0.693   noise: 0.730
Iter 13/50 - Loss: 1.599   lengthscale: 0.693   noise: 0.728
Iter 14/50 - Loss: 1.599   lengthscale: 0.693   noise: 0.723
Iter 15/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.716
Iter 16/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.710
Iter 17/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.706
Iter 18/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.705
Iter 19/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.708
Iter 20/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.712
Iter 21/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.717
Iter 22/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.722
Iter 23/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.723
Iter 24/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.722
Iter 25/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.718
Iter 26/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.713
Iter 27/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.709
Iter 28/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.708
Iter 29/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.709
Iter 30/50 - Loss: 1.598   lengthscale: 0.693   noise: 0.712
Iter 31/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.716
Iter 32/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.719
Iter 33/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.720
Iter 34/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.719
Iter 35/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.716
Iter 36/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.713
Iter 37/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.711
Iter 38/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.710
Iter 39/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.712
Iter 40/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.714
Iter 41/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.716
Iter 42/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.718
Iter 43/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.717
Iter 44/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.716
Iter 45/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.714
Iter 46/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.713
Iter 47/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.712
Iter 48/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.712
Iter 49/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.714
Iter 50/50 - Loss: 1.597   lengthscale: 0.693   noise: 0.715
tensor([-0.1962, -0.1962, -0.1962, -0.1962, -0.1962, -0.1962, -0.1962, -0.1962,
        -0.1962, -0.1962])

Expected Behavior

I expected the model to atleast generate different outputs for different inputs.

System information

Please complete the following information:

Additional context

Add any other context about the problem here. Any advice about training on multidimensional inputs would be highly valuable to me. Thank you so much for your help.

KeAWang commented 4 years ago

This is not a bug. You're training a GP on 35 dimensional data with random inputs and random labels. Given that you're using a ConstantMean prior, predicting a constant function seems very reasonable to me since there's no way your GP can explain the data.

Try it datasets with non-random label. Like

train_x = torch.randn(50,35)
train_y = train_x.sum(-1) + train_x.pow(2).sum(-1)

gives

Iter 1/50 - Loss: 472.871   lengthscale: 0.693   noise: 0.693
Iter 2/50 - Loss: 438.105   lengthscale: 0.697   noise: 0.744
Iter 3/50 - Loss: 406.669   lengthscale: 0.700   noise: 0.798
Iter 4/50 - Loss: 378.275   lengthscale: 0.705   noise: 0.854
Iter 5/50 - Loss: 352.646   lengthscale: 0.709   noise: 0.911
Iter 6/50 - Loss: 329.521   lengthscale: 0.715   noise: 0.970
Iter 7/50 - Loss: 308.654   lengthscale: 0.721   noise: 1.031
Iter 8/50 - Loss: 289.821   lengthscale: 0.728   noise: 1.093
Iter 9/50 - Loss: 272.812   lengthscale: 0.737   noise: 1.155
Iter 10/50 - Loss: 257.438   lengthscale: 0.749   noise: 1.219
Iter 11/50 - Loss: 243.527   lengthscale: 0.766   noise: 1.282
Iter 12/50 - Loss: 230.923   lengthscale: 0.788   noise: 1.346
Iter 13/50 - Loss: 219.486   lengthscale: 0.817   noise: 1.410
Iter 14/50 - Loss: 209.093   lengthscale: 0.850   noise: 1.473
Iter 15/50 - Loss: 199.631   lengthscale: 0.885   noise: 1.537
Iter 16/50 - Loss: 191.002   lengthscale: 0.922   noise: 1.599
Iter 17/50 - Loss: 183.118   lengthscale: 0.961   noise: 1.661
Iter 18/50 - Loss: 175.900   lengthscale: 1.002   noise: 1.722
Iter 19/50 - Loss: 169.280   lengthscale: 1.045   noise: 1.782
Iter 20/50 - Loss: 163.195   lengthscale: 1.091   noise: 1.841
Iter 21/50 - Loss: 157.592   lengthscale: 1.139   noise: 1.899
Iter 22/50 - Loss: 152.421   lengthscale: 1.189   noise: 1.956
Iter 23/50 - Loss: 147.641   lengthscale: 1.242   noise: 2.011
Iter 24/50 - Loss: 143.211   lengthscale: 1.298   noise: 2.066
Iter 25/50 - Loss: 139.099   lengthscale: 1.357   noise: 2.119
Iter 26/50 - Loss: 135.271   lengthscale: 1.418   noise: 2.171
Iter 27/50 - Loss: 131.699   lengthscale: 1.483   noise: 2.222
Iter 28/50 - Loss: 128.355   lengthscale: 1.550   noise: 2.271
Iter 29/50 - Loss: 125.208   lengthscale: 1.620   noise: 2.319
Iter 30/50 - Loss: 122.225   lengthscale: 1.693   noise: 2.366
Iter 31/50 - Loss: 119.367   lengthscale: 1.770   noise: 2.412
Iter 32/50 - Loss: 116.581   lengthscale: 1.850   noise: 2.457
Iter 33/50 - Loss: 113.802   lengthscale: 1.934   noise: 2.501
Iter 34/50 - Loss: 110.944   lengthscale: 2.021   noise: 2.543
Iter 35/50 - Loss: 107.903   lengthscale: 2.113   noise: 2.584
Iter 36/50 - Loss: 104.561   lengthscale: 2.209   noise: 2.625
Iter 37/50 - Loss: 100.800   lengthscale: 2.309   noise: 2.664
Iter 38/50 - Loss: 96.526   lengthscale: 2.415   noise: 2.702
Iter 39/50 - Loss: 91.698   lengthscale: 2.525   noise: 2.738
Iter 40/50 - Loss: 86.352   lengthscale: 2.641   noise: 2.774
Iter 41/50 - Loss: 80.609   lengthscale: 2.762   noise: 2.808
Iter 42/50 - Loss: 74.664   lengthscale: 2.888   noise: 2.840
Iter 43/50 - Loss: 68.748   lengthscale: 3.020   noise: 2.872
Iter 44/50 - Loss: 63.080   lengthscale: 3.155   noise: 2.901
Iter 45/50 - Loss: 57.831   lengthscale: 3.293   noise: 2.930
Iter 46/50 - Loss: 53.105   lengthscale: 3.434   noise: 2.956
Iter 47/50 - Loss: 48.944   lengthscale: 3.575   noise: 2.981
Iter 48/50 - Loss: 45.337   lengthscale: 3.716   noise: 3.005
Iter 49/50 - Loss: 42.242   lengthscale: 3.855   noise: 3.028
Iter 50/50 - Loss: 39.602   lengthscale: 3.991   noise: 3.049
tensor([25.8628, 28.7511, 35.9942, 31.5958, 25.1634, 23.9059, 19.7293, 20.1411,
        27.3224, 37.1836])

For high dimensional data, you should try using an independent lengthscale for each dimension,

self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=35))
KeAWang commented 4 years ago

Essentially you want to find the kernel that is most appropriate for your data.

You could also try some dimensionality reduction techniques.

Godric877 commented 4 years ago

Thank you so much for your help!

KeAWang commented 4 years ago

You're welcome!

faridf commented 1 year ago

Essentially you want to find the kernel that is most appropriate for your data.

You could also try some dimensionality reduction techniques.

You mentioned dimensionality reduction techniques. Do you know of such methods that you could share?