cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 556 forks source link

[Question] Implementing Multi-fidelity models in GPyTorch #594

Closed kunalghosh closed 4 years ago

kunalghosh commented 5 years ago

Hi,

I am trying to implement multi-fidelity GP model using GPyTorch and would need to implement something like e.q. 3.1 in https://arxiv.org/pdf/1604.07484.pdf Should I implement my own kernel class or is there an easier way by re-using existing kernel classes in GPyTorch ?

jacobrgardner commented 5 years ago

It looks to me like their model is essentially a multitask GP between the fidelity levels with a deep base kernel. GPyTorch does indeed support and have tutorials for both of these! Basically if you pass x through any torch Module (think like an nn.Sequential) before passing it through the kernel, the GP and the nn.Sequential will get learned end to end.

Their specific structure for handling multitask is slightly different than our existing multitask stuff for the case where the tasks have different numbers of observations, but you could also implement that without too much trouble by just registering their \rho parameter to your model directly.

Let me know if you need this to be more concrete / you don't find the existing tutorials sufficient and I can try to throw a toy example implementation together later.

kunalghosh commented 5 years ago

Hi @jacobrgardner,

Using a neural network in conjunction with a GP is quite clear for me (thanks to the deep kernel learning GPyTorch doc). But I am still unclear about how I might implement the Multi-Fidelity Kernel.

I tried to modify the IndexKernel and MultiTaskKernel to come up with an implementation of a Two Fidelity Kernel. It would be great if you could take a look and give some feedback. Also, I am not sure how to handle different number of entries for different tasks, any suggestions ?


import torch
from gpytorch.kernels import Kernel
from gpytorch.lazy import DiagLazyTensor, InterpolatedLazyTensor, PsdSumLazyTensor, RootLazyTensor

class TwoFidelityIndexKernel(Kernel):
    def __init__(self,
                num_tasks,
                rank=1,
                prior=None,
                **kwargs
                ):
        if rank > num_tasks:
            raise RuntimeError("Cannot create a task covariance matrix larger than the number of tasks")
        super().__init__(**kwargs)

        self.register_parameter(
            name="rho", parameter=torch.nn.Parameter(torch.randn())
        )
        # if num_tasks = 2 then we first create a 2x2 matrix with rho in all entries.
        self.covar_factor = torch.ones(num_tasks, num_tasks)* self.rho
        # then we construct a 2x2 matrix where [0,0] is 0 and [1,1] is rho
        temp = torch.diag(torch.arange(num_tasks) * self.rho)
        # we elementwise multiply the two matrices together.
        self.covar_factor = self.covar_factor * temp

    def _eval_covar_matrix(self):
        return self.covar_factor

    @property
    def covar_matrix(self):
        res = RootLazyTensor(self.covar_factor)
        return res

    def forward(self, i1, i2, **params):
        covar_matrix = self._eval_covar_matrix()
        res = InterpolatedLazyTensor(base_lazy_tensor=covar_matrix, left_interp_indices=i1, right_interp_indices=i2)
        return res

Then I use the TwoFidelityIndexKernel in the following multi-task kernel.

import torch
import gpytorch
from gpytorch.kernels import Kernel
from gpytorch.lazy import lazify, KroneckerProductLazyTensor

class TwoFidelityKernel(Kernel):
    """
    Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every
    task) using :class:`gpytorch.kernels.IndexKernel` as a basic multitask kernel.

    Given a base covariance module to be used for the data, :math:`K_{XX}`, this kernel computes a task kernel of
    specified size :math:`K_{TT}` and returns :math:`K = K_{TT} \otimes K_{XX}`. as an
    :obj:`gpytorch.lazy.KroneckerProductLazyTensor`.

    Args:
        data_covar_module (:obj:`gpytorch.kernels.Kernel`):
            Kernel to use as the data kernel.
        num_tasks (int):
            Number of tasks
        batch_size (int, optional):
            Set if the MultitaskKernel is operating on batches of data (and you want different
            parameters for each batch)
        rank (int):
            Rank of index kernel to use for task covariance matrix.
        task_covar_prior (:obj:`gpytorch.priors.Prior`):
            Prior to use for task kernel. See :class:`gpytorch.kernels.IndexKernel` for details.
    """

    def __init__(self, data_covar_module1, data_covar_module1, num_tasks, rank=1, task_covar_prior=None, **kwargs):
        """
        """
        super(TwoFidelityKernel, self).__init__(**kwargs)
        self.task_covar_module = TwoFidelityIndexKernel(
            num_tasks=num_tasks, batch_shape=self.batch_shape, rank=rank, prior=task_covar_prior
        )
        self.data_covar_module1 = data_covar_module1 # k1
        self.data_covar_module2 = data_covar_module2 # k2
        self.num_tasks = num_tasks

    def forward(self, x1, x2, diag=False, batch_dims=None, **params):
        if batch_dims == (0, 2):
            raise RuntimeError("AdditiveGridInterpolationKernel does not accept the batch_dims argument.")
        covar_i = self.task_covar_module.covar_matrix
        covar_i = covar_i.repeat(x1.size(0), 1, 1)
        # construct a 2x2 matrix with 1 in [1,1]
        const_mat = torch.diag(torch.arange(self.num_tasks))
        const_mat = const_mat.repeat(x1.size(0), 1, 1)
        covar_x1 = lazify(self.data_covar_module1.forward(x1, x2, **params))
        covar_x2 = lazify(self.data_covar_module2.forward(x1, x2, **params))
        res = KroneckerProductLazyTensor(covar_x1, covar_i) +
              KroneckerProductLazyTensor(covar_x2, const_mat)
        return res.diag() if diag else res

    def size(self, x1, x2):
        """
        Given `n` data points `x1` and `m` datapoints `x2`, this multitask
        kernel returns an `(n*num_tasks) x (m*num_tasks)` covariancn matrix.
        """
        non_batch_size = (self.num_tasks * x1.size(-2), self.num_tasks * x2.size(-2))
        if x1.ndimension() == 3:
            return torch.Size((x1.size(0),) + non_batch_size)
        else:
            return torch.Size(non_batch_size)

I realize that when using the multi-task kernel in GPyTorch the target Y is organized such that Y[:,0] corresponds to task 1 and Y[:,1] corresponds to task 2. But in the paper I have referenced it is organized as, (assuming 0:n1 low fidelity and n1:n2 high fidelity) Y[:n1] = Ylow, Y[n1:n2] = Yhigh. X[n1:n2] has x values repeated.

jacobrgardner commented 5 years ago

@kunalghosh Sorry for the delay, we've been crazy busy trying to get GPyTorch 0.3.0 out the door.

For multitask with different numbers of points per task, you won't be able to use the Kronecker based multitask in the example you looked at. https://github.com/cornellius-gp/gpytorch/blob/master/examples/03_Multitask_GP_Regression/Hadamard_Multitask_GP_Regression.ipynb is a good starting point which will let you have a different task for each datapoint, thereby allowing different numbers of data points per task. In the example, we happened to use torch.linspace(0, 1, 100) as the training locations for each task (so 200 training examples total), but where we create full_train_x that could have been any set of 200 points with the first 100 being task 1 and the second being task 2.

In that notebook, you can see how we specifically build the full kernel matrix in the multitask setting in the forward method of the model itself.

If that example isn't sufficiently useful, let me know and I can try to take a pass at just building an example once some of the work on 0.3.0 is a little further along (probably early next week).

kunalghosh commented 5 years ago

Hi @jacobrgardner Thanks a lot for your help, and apologies for the delayed response. I finally managed to get the two fidelity model to work :) I have attached the code below if someone happens to come across this later.

I have one final question, In the index kernel if we do not set the batch_size (like in the Hadamard MultiTask example) then where is self.batch_size defined (https://github.com/cornellius-gp/gpytorch/blob/3bb10d37d51d79d23f535b9e6b1ddbabbf5c85d8/gpytorch/kernels/index_kernel.py#L58) ?

Also, if num_tasks is 2 and rank = 1 then, is the covar_matrix in an IndexKernel a 2x2 matrix ?

from gpytorch.kernels import Kernel
from gpytorch.lazy import DiagLazyTensor, InterpolatedLazyTensor, PsdSumLazyTensor, RootLazyTensor
from gpytorch.lazy import lazify
from gpytorch.priors import NormalPrior

class TwoFidelityIndexKernel(Kernel):
    """
    Separate kernel for each task based on the Hadamard Product between the task
    kernel and the data kernel. based on :
    https://github.com/cornellius-gp/gpytorch/blob/master/examples/03_Multitask_GP_Regression/Hadamard_Multitask_GP_Regression.ipynb

    The index identifier must start from 0, i.e. all task zero have index identifier 0 and so on.

    If noParams is set to `True` then the covar_factor doesn't include any parameters.
    This is needed to construct the 2nd matrix in the sum, as in (https://arxiv.org/pdf/1604.07484.pdf eq. 3.2) 
    where the kernel is treated as a sum of two kernels.

    k = [      k1, rho   * k1   + [0, 0
         rho * k1, rho^2 * k1]     0, k2]
    """
    def __init__(self,
                num_tasks,
                rank=1, # for two multifidelity always assumed to be 1
                prior=None,
                includeParams=True,
                **kwargs
                ):
        if rank > num_tasks:
            raise RuntimeError("Cannot create a task covariance matrix larger than the number of tasks")
        super().__init__(**kwargs)
        try:
            self.batch_shape
        except AttributeError as e:
            self.batch_shape = 1 #torch.Size([200])

        # we take a power of rho with the task index list (assuming all task 0 represented as 0, task 1 represented as 1 etc.)
        self.covar_factor = torch.arange(num_tasks).to(torch.float32)

        if includeParams:
            self.register_parameter(name="rho", parameter=torch.nn.Parameter(torch.randn(1)))
            print(f"Initial value : rho  {self.rho.item()}")
            self.covar_factor = torch.pow(self.rho.repeat(num_tasks), self.covar_factor)

        self.covar_factor = self.covar_factor.unsqueeze(0).unsqueeze(-1)
        self.covar_factor = self.covar_factor.repeat(self.batch_shape, 1, 1)

        if prior is not None and includeParams is True:
            self.register_prior("rho_prior", prior , self._rho)

    def _rho(self):
        return self.rho

    def _eval_covar_matrix(self):
        transp = self.covar_factor.transpose(-1, 0)
        ret = self.covar_factor.matmul(self.covar_factor.transpose(-1, -2)) #+ D
        return ret

    @property
    def covar_matrix(self):
        res = RootLazyTensor(self.covar_factor)
        return res

    def forward(self, i1, i2, **params):
        covar_matrix = self._eval_covar_matrix()
        res = InterpolatedLazyTensor(base_lazy_tensor=covar_matrix, left_interp_indices=i1, right_interp_indices=i2)
        return res

and here's the code for the MultiTask Model

from pprint import pprint
from gpytorch.priors import NormalPrior

class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        self.covar_module2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
        # We learn an IndexKernel for 2 tasks
        # (so we'll actually learn 2x2=4 tasks with correlations)
        self.task_covar_module1 = TwoFidelityIndexKernel(num_tasks=2, rank=1)
        self.task_covar_module2 = TwoFidelityIndexKernel(num_tasks=2, rank=1, includeParams=False)
        print(f"Initial value : Covar 1, lengthscale {self.covar_module1.base_kernel.lengthscale.item()}, prefactor {self.covar_module1.outputscale.item()}")
        print(f"Initial value : Covar 2, lengthscale {self.covar_module2.base_kernel.lengthscale.item()}, prefactor {self.covar_module2.outputscale.item()}")

    def forward(self,x,i):
        mean_x = self.mean_module(x)

        # Get input-input covariance
        covar1_x = self.covar_module1(x)
        # Get task-task covariance
        covar1_i = self.task_covar_module1(i)

        # Get input-input covariance
        covar2_x = self.covar_module2(x)
        # Get task-task covariance
        covar2_i = self.task_covar_module2(i)

        # Multiply the two together to get the covariance we want
        covar1 = covar1_x.mul(covar1_i)
        covar2 = covar2_x.mul(covar2_i)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar1+covar2)
kunalghosh commented 5 years ago

Also, my code seems to run fine with gpytorch=0.2.1 but with the latest release it stopped working.

RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

The following line in TwoFidelityIndexKernel is the causing the error (self.batch_shape is 1)

self.covar_factor = self.covar_factor.repeat(self.batch_shape, 1, 1)

jacobrgardner commented 5 years ago

Great, glad to hear you got it working.

batch_shape is an arg to the base Kernel class itself

jacobrgardner commented 5 years ago

Oh I see, you get an error on the latest release, haha. I think you want to do self.covar_factor.unsqueeze so it has the right number of dims, and then self.covar_factor.repeat(*self.batch_shape, 1, 1)

gpleiss commented 4 years ago

closing for now - reopen if there's still questions :)

ndalchau commented 4 years ago

@kunalghosh , I've been looking into multi-fidelity models, and wondered whether you have updated your example above to the current version of GPyTorch. It might even be worth trying to include this as a built-in example so that others can see how to do multi-fidelity in GPyTorch.

I've tried to naively copy the modules into a script and run with some assumed input data, but there appears to be a few issues that need resolving. If you've already solved those, that would be helpful.

Thanks in advance.