[Question] Does adding priors do anything if not using Pyro?

yclicc commented 1 year ago

Following the basic exact inference tutorials I tried adding a prior to a parameter (e.g. as suggested at https://github.com/cornellius-gp/gpytorch/issues/1297#issuecomment-699977103) but this seemed to make no difference to anything: The final loss and value of the parameter was the same whether I set a prior or not, and furthermore accessing the value of the prior afterwards showed it still set to its default value. Is this a bug, or am I somehow fundamentally misunderstanding what "exact" Gaussian Processes do?

Balandat commented 1 year ago

Yes, using a prior should make a difference - do you have a full repro so we can take a look?

yclicc commented 1 year ago

I've just tried to get a basic repro without success...Is there a way to access the posterior value of each prior?

Here's the code that doesn't show the problem:

import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

# Training data is 100 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 10000)
# True function is sin(2*pi*x) with Gaussian noise
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * math.sqrt(0.04)

# We will use the simplest form of GP model, exact inference
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, lengthscale_prior=None):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(lengthscale_prior=lengthscale_prior))

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iter = 2 if smoke_test else 50

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

if torch.cuda.is_available():
    train_x = train_x.cuda()
    train_y = train_y.cuda()
    model = model.cuda()
    likelihood = likelihood.cuda()

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

train_x = train_x.cpu()
train_y = train_y.cpu()
model = model.cpu()
likelihood = likelihood.cpu()

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 51)
    observed_pred = likelihood(model(test_x))

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

likelihood2 = gpytorch.likelihoods.GaussianLikelihood()
# Set a really big prior just to make sure it has a noticeable effect
model2 = ExactGPModel(train_x, train_y, likelihood2, lengthscale_prior=gpytorch.priors.SmoothedBoxPrior(1000, 2000, sigma=0.001))

# Find optimal model hyperparameters
model2.train()
likelihood2.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model2.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll2 = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood2, model2)

if torch.cuda.is_available():
    train_x = train_x.cuda()
    train_y = train_y.cuda()
    model2 = model2.cuda()
    likelihood2 = likelihood2.cuda()

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model2(train_x)
    # Calc loss and backprop gradients
    loss = -mll2(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model2.covar_module.base_kernel.lengthscale.item(),
        model2.likelihood.noise.item()
    ))
    optimizer.step()

train_x = train_x.cpu()
train_y = train_y.cpu()
model2 = model2.cpu()
likelihood2 = likelihood2.cpu()

# Get into evaluation (predictive posterior) mode
model2.eval()
likelihood2.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.linspace(0, 1, 51)
    observed_pred = likelihood2(model2(test_x))

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

This indeed returns a different result with the prior set:

But then running model2.covar_module.base_kernel.lengthscale_prior.state_dict() returns

OrderedDict([('a', tensor([1000.])),
             ('b', tensor([2000.])),
             ('sigma', tensor([0.0010])),
             ('tails.loc', tensor([0.])),
             ('tails.scale', tensor([0.0010]))])

showing that the prior hasn't changed. Is there some place the prior's (i.e. the posterior's!) updated value ought to be available?

Balandat commented 1 year ago

showing that the prior hasn't changed.

So this is expected since the prior is a static prior on the model parameters (this is different from the GP model, which without data is a prior over functions). We don't compute any posterior for that. What we do is compute the model posterior using the MLE of the parameters under the data and the prior(s) over the parameters. This happens here, where the additional loss terms (from the prior) are added to the MLL.

gpleiss commented 1 year ago

@yclicc since this is a usage question (rather than an issue), please open a discussion topic if you have further questions.

cornellius-gp / gpytorch

[Question] Does adding priors do anything if not using Pyro? #2309