Closed kunalghosh closed 5 years ago
It looks to me like their model is essentially a multitask GP between the fidelity levels with a deep base kernel. GPyTorch does indeed support and have tutorials for both of these! Basically if you pass x through any torch Module (think like an nn.Sequential) before passing it through the kernel, the GP and the nn.Sequential will get learned end to end.
Their specific structure for handling multitask is slightly different than our existing multitask stuff for the case where the tasks have different numbers of observations, but you could also implement that without too much trouble by just registering their \rho parameter to your model directly.
Let me know if you need this to be more concrete / you don't find the existing tutorials sufficient and I can try to throw a toy example implementation together later.
Hi @jacobrgardner,
Using a neural network in conjunction with a GP is quite clear for me (thanks to the deep kernel learning GPyTorch doc). But I am still unclear about how I might implement the Multi-Fidelity Kernel.
I tried to modify the IndexKernel
and MultiTaskKernel
to come up with an implementation of a Two Fidelity Kernel. It would be great if you could take a look and give some feedback. Also, I am not sure how to handle different number of entries for different tasks, any suggestions ?
import torch
from gpytorch.kernels import Kernel
from gpytorch.lazy import DiagLazyTensor, InterpolatedLazyTensor, PsdSumLazyTensor, RootLazyTensor
class TwoFidelityIndexKernel(Kernel):
def __init__(self,
num_tasks,
rank=1,
prior=None,
**kwargs
):
if rank > num_tasks:
raise RuntimeError("Cannot create a task covariance matrix larger than the number of tasks")
super().__init__(**kwargs)
self.register_parameter(
name="rho", parameter=torch.nn.Parameter(torch.randn())
)
# if num_tasks = 2 then we first create a 2x2 matrix with rho in all entries.
self.covar_factor = torch.ones(num_tasks, num_tasks)* self.rho
# then we construct a 2x2 matrix where [0,0] is 0 and [1,1] is rho
temp = torch.diag(torch.arange(num_tasks) * self.rho)
# we elementwise multiply the two matrices together.
self.covar_factor = self.covar_factor * temp
def _eval_covar_matrix(self):
return self.covar_factor
@property
def covar_matrix(self):
res = RootLazyTensor(self.covar_factor)
return res
def forward(self, i1, i2, **params):
covar_matrix = self._eval_covar_matrix()
res = InterpolatedLazyTensor(base_lazy_tensor=covar_matrix, left_interp_indices=i1, right_interp_indices=i2)
return res
Then I use the TwoFidelityIndexKernel in the following multi-task kernel.
import torch
import gpytorch
from gpytorch.kernels import Kernel
from gpytorch.lazy import lazify, KroneckerProductLazyTensor
class TwoFidelityKernel(Kernel):
"""
Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every
task) using :class:`gpytorch.kernels.IndexKernel` as a basic multitask kernel.
Given a base covariance module to be used for the data, :math:`K_{XX}`, this kernel computes a task kernel of
specified size :math:`K_{TT}` and returns :math:`K = K_{TT} \otimes K_{XX}`. as an
:obj:`gpytorch.lazy.KroneckerProductLazyTensor`.
Args:
data_covar_module (:obj:`gpytorch.kernels.Kernel`):
Kernel to use as the data kernel.
num_tasks (int):
Number of tasks
batch_size (int, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different
parameters for each batch)
rank (int):
Rank of index kernel to use for task covariance matrix.
task_covar_prior (:obj:`gpytorch.priors.Prior`):
Prior to use for task kernel. See :class:`gpytorch.kernels.IndexKernel` for details.
"""
def __init__(self, data_covar_module1, data_covar_module1, num_tasks, rank=1, task_covar_prior=None, **kwargs):
"""
"""
super(TwoFidelityKernel, self).__init__(**kwargs)
self.task_covar_module = TwoFidelityIndexKernel(
num_tasks=num_tasks, batch_shape=self.batch_shape, rank=rank, prior=task_covar_prior
)
self.data_covar_module1 = data_covar_module1 # k1
self.data_covar_module2 = data_covar_module2 # k2
self.num_tasks = num_tasks
def forward(self, x1, x2, diag=False, batch_dims=None, **params):
if batch_dims == (0, 2):
raise RuntimeError("AdditiveGridInterpolationKernel does not accept the batch_dims argument.")
covar_i = self.task_covar_module.covar_matrix
covar_i = covar_i.repeat(x1.size(0), 1, 1)
# construct a 2x2 matrix with 1 in [1,1]
const_mat = torch.diag(torch.arange(self.num_tasks))
const_mat = const_mat.repeat(x1.size(0), 1, 1)
covar_x1 = lazify(self.data_covar_module1.forward(x1, x2, **params))
covar_x2 = lazify(self.data_covar_module2.forward(x1, x2, **params))
res = KroneckerProductLazyTensor(covar_x1, covar_i) +
KroneckerProductLazyTensor(covar_x2, const_mat)
return res.diag() if diag else res
def size(self, x1, x2):
"""
Given `n` data points `x1` and `m` datapoints `x2`, this multitask
kernel returns an `(n*num_tasks) x (m*num_tasks)` covariancn matrix.
"""
non_batch_size = (self.num_tasks * x1.size(-2), self.num_tasks * x2.size(-2))
if x1.ndimension() == 3:
return torch.Size((x1.size(0),) + non_batch_size)
else:
return torch.Size(non_batch_size)
I realize that when using the multi-task kernel in GPyTorch the target Y is organized such that Y[:,0] corresponds to task 1 and Y[:,1] corresponds to task 2. But in the paper I have referenced it is organized as, (assuming 0:n1 low fidelity and n1:n2 high fidelity) Y[:n1] = Ylow, Y[n1:n2] = Yhigh. X[n1:n2] has x values repeated.
@kunalghosh Sorry for the delay, we've been crazy busy trying to get GPyTorch 0.3.0 out the door.
For multitask with different numbers of points per task, you won't be able to use the Kronecker based multitask in the example you looked at. https://github.com/cornellius-gp/gpytorch/blob/master/examples/03_Multitask_GP_Regression/Hadamard_Multitask_GP_Regression.ipynb is a good starting point which will let you have a different task for each datapoint, thereby allowing different numbers of data points per task. In the example, we happened to use torch.linspace(0, 1, 100)
as the training locations for each task (so 200 training examples total), but where we create full_train_x
that could have been any set of 200 points with the first 100 being task 1 and the second being task 2.
In that notebook, you can see how we specifically build the full kernel matrix in the multitask setting in the forward
method of the model itself.
If that example isn't sufficiently useful, let me know and I can try to take a pass at just building an example once some of the work on 0.3.0 is a little further along (probably early next week).
Hi @jacobrgardner Thanks a lot for your help, and apologies for the delayed response. I finally managed to get the two fidelity model to work :) I have attached the code below if someone happens to come across this later.
I have one final question, In the index kernel if we do not set the batch_size (like in the Hadamard MultiTask example) then where is self.batch_size defined (https://github.com/cornellius-gp/gpytorch/blob/3bb10d37d51d79d23f535b9e6b1ddbabbf5c85d8/gpytorch/kernels/index_kernel.py#L58) ?
Also, if num_tasks is 2 and rank = 1 then, is the covar_matrix in an IndexKernel a 2x2 matrix ?
from gpytorch.kernels import Kernel
from gpytorch.lazy import DiagLazyTensor, InterpolatedLazyTensor, PsdSumLazyTensor, RootLazyTensor
from gpytorch.lazy import lazify
from gpytorch.priors import NormalPrior
class TwoFidelityIndexKernel(Kernel):
"""
Separate kernel for each task based on the Hadamard Product between the task
kernel and the data kernel. based on :
https://github.com/cornellius-gp/gpytorch/blob/master/examples/03_Multitask_GP_Regression/Hadamard_Multitask_GP_Regression.ipynb
The index identifier must start from 0, i.e. all task zero have index identifier 0 and so on.
If noParams is set to `True` then the covar_factor doesn't include any parameters.
This is needed to construct the 2nd matrix in the sum, as in (https://arxiv.org/pdf/1604.07484.pdf eq. 3.2)
where the kernel is treated as a sum of two kernels.
k = [ k1, rho * k1 + [0, 0
rho * k1, rho^2 * k1] 0, k2]
"""
def __init__(self,
num_tasks,
rank=1, # for two multifidelity always assumed to be 1
prior=None,
includeParams=True,
**kwargs
):
if rank > num_tasks:
raise RuntimeError("Cannot create a task covariance matrix larger than the number of tasks")
super().__init__(**kwargs)
try:
self.batch_shape
except AttributeError as e:
self.batch_shape = 1 #torch.Size([200])
# we take a power of rho with the task index list (assuming all task 0 represented as 0, task 1 represented as 1 etc.)
self.covar_factor = torch.arange(num_tasks).to(torch.float32)
if includeParams:
self.register_parameter(name="rho", parameter=torch.nn.Parameter(torch.randn(1)))
print(f"Initial value : rho {self.rho.item()}")
self.covar_factor = torch.pow(self.rho.repeat(num_tasks), self.covar_factor)
self.covar_factor = self.covar_factor.unsqueeze(0).unsqueeze(-1)
self.covar_factor = self.covar_factor.repeat(self.batch_shape, 1, 1)
if prior is not None and includeParams is True:
self.register_prior("rho_prior", prior , self._rho)
def _rho(self):
return self.rho
def _eval_covar_matrix(self):
transp = self.covar_factor.transpose(-1, 0)
ret = self.covar_factor.matmul(self.covar_factor.transpose(-1, -2)) #+ D
return ret
@property
def covar_matrix(self):
res = RootLazyTensor(self.covar_factor)
return res
def forward(self, i1, i2, **params):
covar_matrix = self._eval_covar_matrix()
res = InterpolatedLazyTensor(base_lazy_tensor=covar_matrix, left_interp_indices=i1, right_interp_indices=i2)
return res
and here's the code for the MultiTask Model
from pprint import pprint
from gpytorch.priors import NormalPrior
class MultitaskGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module1 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
self.covar_module2 = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
# We learn an IndexKernel for 2 tasks
# (so we'll actually learn 2x2=4 tasks with correlations)
self.task_covar_module1 = TwoFidelityIndexKernel(num_tasks=2, rank=1)
self.task_covar_module2 = TwoFidelityIndexKernel(num_tasks=2, rank=1, includeParams=False)
print(f"Initial value : Covar 1, lengthscale {self.covar_module1.base_kernel.lengthscale.item()}, prefactor {self.covar_module1.outputscale.item()}")
print(f"Initial value : Covar 2, lengthscale {self.covar_module2.base_kernel.lengthscale.item()}, prefactor {self.covar_module2.outputscale.item()}")
def forward(self,x,i):
mean_x = self.mean_module(x)
# Get input-input covariance
covar1_x = self.covar_module1(x)
# Get task-task covariance
covar1_i = self.task_covar_module1(i)
# Get input-input covariance
covar2_x = self.covar_module2(x)
# Get task-task covariance
covar2_i = self.task_covar_module2(i)
# Multiply the two together to get the covariance we want
covar1 = covar1_x.mul(covar1_i)
covar2 = covar2_x.mul(covar2_i)
return gpytorch.distributions.MultivariateNormal(mean_x, covar1+covar2)
Also, my code seems to run fine with gpytorch=0.2.1 but with the latest release it stopped working.
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor
The following line in TwoFidelityIndexKernel is the causing the error (self.batch_shape is 1)
self.covar_factor = self.covar_factor.repeat(self.batch_shape, 1, 1)
Great, glad to hear you got it working.
batch_shape
is an arg to the base Kernel class itself
Oh I see, you get an error on the latest release, haha. I think you want to do self.covar_factor.unsqueeze
so it has the right number of dims, and then self.covar_factor.repeat(*self.batch_shape, 1, 1)
closing for now - reopen if there's still questions :)
@kunalghosh , I've been looking into multi-fidelity models, and wondered whether you have updated your example above to the current version of GPyTorch. It might even be worth trying to include this as a built-in example so that others can see how to do multi-fidelity in GPyTorch.
I've tried to naively copy the modules into a script and run with some assumed input data, but there appears to be a few issues that need resolving. If you've already solved those, that would be helpful.
Thanks in advance.
Hi,
I am trying to implement multi-fidelity GP model using GPyTorch and would need to implement something like e.q. 3.1 in https://arxiv.org/pdf/1604.07484.pdf Should I implement my own kernel class or is there an easier way by re-using existing kernel classes in GPyTorch ?