Binary Classification with ExactGP and DirichletClassificationLikelihood

Hi,

I want to do binary classification with ExactGP and DirichletClassificationLikelihood. I have two problems: 1- mll=ExactMarginalLogLikelihood is not returning scaler loss, its shape is [2]. 2- I tried to use mll(output, train_y).sum(), and as a result, training could be done. But in test time I get errors in using the model to predict test_x, Error: shape '[100]' is invalid for the input of size 200. here is the code for reproducing errors.


import math
from gpytorch import likelihoods
import torch
import gpytorch
from matplotlib import pyplot as plt

train_x = torch.linspace(0, 1, 100)
train_y = torch.sign(torch.cos(train_x * (4 * math.pi))).add(1).div(2)

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, kernel='rbf', inducing_points=None):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module  = gpytorch.means.ConstantMean()

        ## RBF kernel
        if(kernel=='rbf' or kernel=='RBF'):
            # self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.base_covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.covar_module = gpytorch.kernels.InducingPointKernel(self.base_covar_module, inducing_points=inducing_points , likelihood=likelihood)

    def forward(self, x):
        mean_x  = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# Initialize model and likelihood
inducing_point = train_x[:10]
train_y = torch.round(train_y).long()
likelihood = gpytorch.likelihoods.DirichletClassificationLikelihood(targets=train_y, learn_additional_noise=False)
# likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood, 'rbf', inducing_point)
training_iterations = 200
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iterations):
    # Zero backpropped gradients from previous iteration
    optimizer.zero_grad()
    # Get predictive output
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y).sum()
    loss.backward()
    # print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    if (i+1)%50==0:
        print(f'Iter {i + 1:02}/{training_iterations} - Loss: {loss.item():.4f}')
    optimizer.step()

# Go into eval mode
model.eval()
likelihood.eval()
with torch.no_grad():
    # Test x are regularly spaced by 0.01 0,1 inclusive
    test_x = torch.linspace(0, 1, 100)
    # test_labels = torch.round(test_x).long()
    # Get classification predictions
    output = model(test_x)
    observed_pred = likelihood(output) 

    # Initialize fig and axes for plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Get the predicted labels (probabilites of belonging to the positive class)
    # Transform these probabilities to be 0/1 labels
    print(observed_pred)
    pred_labels = observed_pred.mean.ge(0.5).float()
    ax.plot(test_x.numpy(), pred_labels.numpy(), 'b')
    ax.set_ylim([-1, 2])
    ax.legend(['Observed Data', 'Mean'])
    plt.show()

What is wrong with my usage of DirichletClassificationLikelihood? Thanks in advance for the incoming help.

Hi, so the MLL shouldn't be returning a scalar -- the trick with the Dirichlet likelihood is to model the outputs as num_classes outputs so the MLL should return a vector of size num_classes.

What version of pytorch and gpytorch are you using?

I wasn't able to reproduce your shaping based error once I passed in the inputs correctly, although I did find a small bug (#1728) while trying to reproduce it.

Hi, so the MLL shouldn't be returning a scalar -- the trick with the Dirichlet likelihood is to model the outputs as num_classes outputs so the MLL should return a vector of size num_classes.

What version of pytorch and gpytorch are you using?

I wasn't able to reproduce your shaping based error once I passed in the inputs correctly, although I did find a small bug (#1728) while trying to reproduce it.

Thank you for your reply, I use gpytorch 1.5.0 and pythorch 1.8.1. It is right to sum the loss, isn't it? here is the error just in test time.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\ADABI\anaconda\anaconda3\lib\site-packages\gpytorch\models\exact_gp.py", line 322, in __call__
    predictive_mean = predictive_mean.view(*batch_shape, *test_shape).contiguous()
RuntimeError: shape '[100]' is invalid for input of size 200

Yes, you should sum the loss as shown in the tutorial.

I passed in the inputs correctly

Hi, so the MLL shouldn't be returning a scalar -- the trick with the Dirichlet likelihood is to model the outputs as num_classes outputs so the MLL should return a vector of size num_classes.

What version of pytorch and gpytorch are you using?

I wasn't able to reproduce your shaping based error once I passed in the inputs correctly, although I did find a small bug (#1728) while trying to reproduce it.

What is your mean by " I passed in the inputs correctly"? I tried to reshape the input but still getting shape error.

Yes, so the transformation in the DirichletClassificationLikelihood produces pseudo-targets so to speak (in the tutorial, these are likelihood.transformed_targets --- you will want to use them to pass as input to the model and to train with as well).

Additionally, you end up needing to pass the number of classes into at least the mean module. So code like this runs fine for me:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, kernel='rbf', inducing_points=None):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module  = gpytorch.means.ConstantMean(batch_shape=torch.Size((2,)))

        ## RBF kernel
        if(kernel=='rbf' or kernel=='RBF'):
            # self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.base_covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.covar_module = gpytorch.kernels.InducingPointKernel(
                self.base_covar_module, inducing_points=inducing_points , likelihood=likelihood
            )

    def forward(self, x):
        mean_x  = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# Initialize model and likelihood
inducing_point = train_x[:10]
train_y = torch.round(train_y).long()
likelihood = gpytorch.likelihoods.DirichletClassificationLikelihood(targets=train_y, learn_additional_noise=False)
# NOTE THE TRANSFORM HERE
model = ExactGPModel(train_x, likelihood.transformed_targets, likelihood, 'rbf', inducing_point)
training_iterations = 200

# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iterations):
    # Zero backpropped gradients from previous iteration
    optimizer.zero_grad()
    # Get predictive output
    output = model(train_x)
    # Calc loss and backprop gradients
    # THE responses we train with are the transformed_targets here.
    loss = -mll(output, likelihood.transformed_targets).sum()
    loss.backward()
    # print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    if (i+1)%50==0:
        print(f'Iter {i + 1:02}/{training_iterations} - Loss: {loss.item():.4f}')
    optimizer.step()

Yes, so the transformation in the DirichletClassificationLikelihood produces pseudo-targets, so to speak (in the tutorial, these are likelihood.transformed_targets --- you will want to use them to pass as input to the model and to train with as well).

Additionally, you end up needing to pass the number of classes into at least the mean module. So code like this runs fine for me:

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood, kernel='rbf', inducing_points=None):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module  = gpytorch.means.ConstantMean(batch_shape=torch.Size((2,)))

        ## RBF kernel
        if(kernel=='rbf' or kernel=='RBF'):
            # self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.base_covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
            self.covar_module = gpytorch.kernels.InducingPointKernel(
                self.base_covar_module, inducing_points=inducing_points , likelihood=likelihood
            )

    def forward(self, x):
        mean_x  = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# Initialize model and likelihood
inducing_point = train_x[:10]
train_y = torch.round(train_y).long()
likelihood = gpytorch.likelihoods.DirichletClassificationLikelihood(targets=train_y, learn_additional_noise=False)
# NOTE THE TRANSFORM HERE
model = ExactGPModel(train_x, likelihood.transformed_targets, likelihood, 'rbf', inducing_point)
training_iterations = 200

# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iterations):
    # Zero backpropped gradients from previous iteration
    optimizer.zero_grad()
    # Get predictive output
    output = model(train_x)
    # Calc loss and backprop gradients
    # THE responses we train with are the transformed_targets here.
    loss = -mll(output, likelihood.transformed_targets).sum()
    loss.backward()
    # print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    if (i+1)%50==0:
        print(f'Iter {i + 1:02}/{training_iterations} - Loss: {loss.item():.4f}')
    optimizer.step()

Thank you very much for your clear explanation and code correcting; all errors have been resolved.

cornellius-gp / gpytorch

Binary Classification with ExactGP and DirichletClassificationLikelihood #1727