Difference between "Batch GP Regression" and "Batch Independent Multioutput GP"

Rashfu commented 9 months ago

What is the difference between Batch GP Regression and Batch Independent Multioutput GP?

Assuming the shape of the input x is [400, 2], and the shape of the output y is [400, 3], I can either directly use Batch Independent Multioutput GP or I can duplicate the input x to have a shape of [3, 400, 2], and adjust the output y to have a shape of [3, 400], then utilize Batch GP Regression . The data for both approaches can be exactly the same, with only minor differences in the code.

In Batch Independent Multioutput GP, it says that _Unlike in the Multitask GP Example, this do not model correlations between outcomes, but treats outcomes independently_. But the code still uses the MultitaskMVN and MultitaskGaussianLikelihood. While in Batch GP Regression , it says that we do NOT account for any correlations between the different functions being modeled. I am confused about the correlations mentioned in the two tutorials. Are these two correlations the same? I'm not clear about the difference between multioutput and multitask.

Thanks in advance !

Balandat commented 9 months ago

These two are essentially the same model - In the Batch GP Regression case the model structure is just a single-output model with an additional batch dim that corresponds to the different functions you model. In the batch independent multioutput GP the model has explicit (trailing) output dimensions - these are realized by the MultitaskMultivariateNormal that is being used in the model (hence you need the MultitaskGaussianLikelihood). The main thing to note here is that the resulting covariance across the outputs is block-diagonal, i.e. the correlations between outputs are not being modeled here. For that you need to use an actual Multitask GP.

Rashfu commented 9 months ago

I understand that the resulting covariance across the outputs is block-diagonal. But I visualized the covariance of output, and it doesn't appear to be a block diagonal matrix. There are still non-zero values in other positions.

The code is below with train_x [300, 2] and train_y [300, 3]:

class BatchIndependentMultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super().__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean(batch_shape=torch.Size([3]))
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(batch_shape=torch.Size([3])),
            batch_shape=torch.Size([3])
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        batch_mvn = gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
        multitask_mvn = gpytorch.distributions.MultitaskMultivariateNormal.from_batch_mvn(batch_mvn)
        return multitask_mvn

likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=3)
model = BatchIndependentMultitaskGPModel(train_x, train_y, likelihood)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) 

model.train()
likelihood.train()
training_iterations = 100

for i in range(training_iterations):

    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()

The covariance of output=model(train_x) looks like this: Screenshot 2023-09-12 14:49:44

Balandat commented 9 months ago

But I visualized the covariance of output, and it doesn't appear to be a block diagonal matrix. There are still non-zero values in other positions.

The MultitaskMultivariateNormal has an interleaved setting: https://github.com/cornellius-gp/gpytorch/blob/981edd83a671e8dca9d67c91b188702354884a34/gpytorch/distributions/multitask_multivariate_normal.py#L27-L29

So you'll have to reshape the covariance matrix to see the block-diagonal structure w.r.t. the outputs. See here: https://github.com/cornellius-gp/gpytorch/blob/981edd83a671e8dca9d67c91b188702354884a34/gpytorch/distributions/multitask_multivariate_normal.py#L62-L68

Rashfu commented 9 months ago

🌹 Thank you for your clarification that resolved all my confusion. I will close this issue now !

cornellius-gp / gpytorch

Difference between "Batch GP Regression" and "Batch Independent Multioutput GP" #2402