cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.58k stars 561 forks source link

Variational Multitask GP with correlated outputs #1035

Closed powertj closed 1 year ago

powertj commented 4 years ago

Hi,

I noticed in the tutorials that you can create an exact multi-task GP with correlated outputs:

https://gpytorch.readthedocs.io/en/latest/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html

Is it possible to do this with a variational GP? I have tried the following which doesn't work:

class MultitaskSVGP(gpytorch.models.ApproximateGP):
    def __init__(self, inducing_points, input_dim, output_dim):
        batch_shape = torch.Size([output_dim])

        variational_distribution = CholeskyVariationalDistribution(inducing_points.size(-2), 
                                                                   batch_shape=batch_shape)

        variational_strategy = MultitaskVariationalStrategy(
                                VariationalStrategy(self, inducing_points, 
                                                   variational_distribution, 
                                                   learn_inducing_locations=True), 
                                num_tasks=output_dim)

        super().__init__(variational_strategy)

        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=output_dim
        )

        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(ard_num_dims=input_dim), num_tasks=output_dim, rank=1
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)
gpleiss commented 4 years ago

I think the code example you posted should in theory work. Mind posting the stack trace?

powertj commented 4 years ago

Sure. I run the following (with the above class):

input_dim = 2
output_dim = 2
num_inducing = 100

inducing_points = torch.randn(output_dim, num_inducing, input_dim)
model = MultitaskSVGP(inducing_points, input_dim, output_dim)

x = torch.zeros(1, input_dim)
model(x)

and I get:


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-aedfeb755132> in <module>
      1 x = torch.zeros(1, input_dim)
----> 2 model(x)

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/models/approximate_gp.py in __call__(self, inputs, prior, **kwargs)
     79         if inputs.dim() == 1:
     80             inputs = inputs.unsqueeze(-1)
---> 81         return self.variational_strategy(inputs, prior=prior)

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/variational/multitask_variational_strategy.py in __call__(self, x, prior)
     41 
     42     def __call__(self, x, prior=False):
---> 43         function_dist = self.base_variational_strategy(x, prior=prior)
     44         if (
     45             self.task_dim > 0

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/variational/variational_strategy.py in __call__(self, x, prior)
    163                 self.updated_strategy.fill_(True)
    164 
--> 165         return super().__call__(x, prior=prior)

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/variational/_variational_strategy.py in __call__(self, x, prior)
    125                 inducing_points,
    126                 inducing_values=variational_dist_u.mean,
--> 127                 variational_inducing_covar=variational_dist_u.lazy_covariance_matrix,
    128             )
    129         elif isinstance(variational_dist_u, Delta):

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     22 
     23     def __call__(self, *inputs, **kwargs):
---> 24         outputs = self.forward(*inputs, **kwargs)
     25         if isinstance(outputs, list):
     26             return [_validate_module_outputs(output) for output in outputs]

~/miniconda3/envs/xdtl/lib/python3.6/site-packages/gpytorch/variational/variational_strategy.py in forward(self, x, inducing_points, inducing_values, variational_inducing_covar)
    106                 interp_term.transpose(-1, -2), (inducing_values - self.prior_distribution.mean).unsqueeze(-1)
    107             ).squeeze(-1)
--> 108             + test_mean
    109         )
    110 

RuntimeError: The size of tensor a (102) must match the size of tensor b (0) at non-singleton dimension 2
gpleiss commented 4 years ago

Thanks! I'll take a look.

gpleiss commented 4 years ago

Right - so we're working on a feature that will add ICM and LCM-type variational multitask models. At the moment this kind of example won't work - because the MultitaskVariationalStrategy assumes it's getting a batch of indepenent prior GPs rather than a multi-output GP.

For now, if you try the SVGP multitask example you can learn a multi-output SVGP model. It doesn't assume any correlations

holmrenser commented 4 years ago

I just realized that I'm getting this exact same error in #1041 with the variational stuff.

Potentially offending line in the source code.

Not 100% sure of the issue, but it seems like the slice is not correctly removing inducing points? It seems to trim columns (dimensions) instead of rows (observations).

gpleiss commented 4 years ago

It seems to trim columns (dimensions) instead of rows (observations).

Yeah - to make this multitask compatible, num_induc would have to be replaced by num_induc * num_tasks. We store the means of multitask MVNs as flattened vectors (e.g. a vector of size nt, where n is the number of data and t is the number of tasks). The covariances are stored as (nt x nt matrices). Long story short, this code will work if num_induc takes into account the number of tasks.

The reason we currently aren't doing this in code right now is that it is numerically inefficient. The variational strategies aren't (at the moment) designed to exploit matrix structure in the same way that our exact GP code does. For multitask MVNs, the covariance has kronecker structure, and if you don't exploit this then inference is painfully slow.

Again, I think that we're hoping to re-vamp the variational multitask interface in the near future - which will make everything a lot more flexible (and fast/efficient too, hopefully).

(Also, sorry for the slow reply - had back-to-back-to-back submissions.)

fabiankueppers commented 3 years ago

Hi guys, are there any updates on this issue? This is also relevant for my work as I also got stuck using the MultitaskKernel in conjunction with VariationalStrategy. Thank you!

wjmaddox commented 3 years ago

At this point, you can use LMCVariationalStrategy to model correlated outputs -- which I don't believe existed back when this issue was written. A tutorial on this is in the docs: https://docs.gpytorch.ai/en/latest/examples/04_Variational_and_Approximate_GPs/SVGP_Multitask_GP_Regression.html .

fabiankueppers commented 3 years ago

Thanks for your quick response! Just to be 100% sure - is this equivalent to modelling the dependencies by the Kronecker product used within the MultitaskKernel?

wjmaddox commented 3 years ago

Not quite, this corresponds to a linear model of co-regionalization (LMC) rather than an intrinsic model of co-regionalization (ICM) (which is what the MultitaskKernel implements). This implementation reduces to an ICM model (multitask kernel equivalent) when the num_latents = 1 but has a rank-1 intertask covariance matrix.

fabiankueppers commented 3 years ago

Thank you very much!

dinithins commented 2 years ago

Hi, Any update on re-vamping the variational multitask interface to efficiently handle Kronecker product kernel used in MultitaskKernel? Thanks.

gpleiss commented 2 years ago

@dinithins There isn't a straightforward way to exploit Kronecker structure for variational models. This is because - even though the prior has Kronecker structure - there isn't any immediately obvious structure that can be assigned to the variational distribution.

The variational multitask GP implementation - which implements the linear model of coregionalization - is actually quite efficient though!

WHU-EE commented 1 year ago

I learned a lot from this issue. And could you provide any relevant literature, especially about LMC? Thanks.

gpleiss commented 1 year ago

@WHU-EE check out this Alvarez et al. paper: https://arxiv.org/abs/1106.6251