Closed g-benton closed 5 years ago
@g-benton I'm not sure what you'd like here beyond what already exists in the package. A kronecker matrix plus a scaled identity matrix can be represented by a AddedDiagLazyTensor(KroneckerProductLazyTensor, DiagLazyTensor))
structure, which will use CG for fast inference.
The Kronecker specific methods are slightly faster in #725 on the CPU for relatively small matrices, but it's not clear that it's a lot faster on the GPU. What is currently implemented should be pretty fast on the GPU, and is much faster than the naive O(m^2n^2)
way of doing things.
If you want to specifically write a new LazyTensor
still, the necessary steps are pretty well documented in the LazyTensor
docs. You don't specifically need to overwrite inv_quad_logdet
, although you can if you'd like. Either way, once you've written a new LazyTensor
, all you need to do to use it in GPyTorch is write a kernel that returns one.
In the pure Kronecker case, without missing data, the fastest way to do inference and learning requires only eigendecompositions of the constituent matrices and a few Kronecker MVMs, rather than CG. This would be useful functionality for spatiotemporal regression problems, which are often exactly on a Cartesian product grid.
Here's a small chunk of code that hopefully explains further why we would want to use a custom inv_quad_logdet
function (which I believe requires getting a custom lazy tensor returned by likelihood(model(train_x))
).
Using a grid_size
of 100 everything is fine, but when you increase grid_size
to 1000 the custom method is dramatically faster.
I'm splitting up the Kronecker and noise components to make it work here, but in practice we would want to use a custom lazy tensor that held both the Kronecker and noise components.
import torch
import math
import gpytorch
grid_bounds = [(0, 1), (0, 1)]
grid_size = 1000
grid = torch.zeros(grid_size, len(grid_bounds))
for i in range(len(grid_bounds)):
grid_diff = float(grid_bounds[i][1] - grid_bounds[i][0]) / (grid_size - 2)
grid[:, i] = torch.linspace(grid_bounds[i][0] - grid_diff, grid_bounds[i][1] + grid_diff, grid_size)
train_x = gpytorch.utils.grid.create_data_from_grid(grid)
train_y = torch.sin((train_x[:, 0] + train_x[:, 1]) * (2 * math.pi)) + torch.randn_like(train_x[:, 0]).mul(0.01)
from gpytorch.means import ConstantMean
from gpytorch.kernels import ScaleKernel, RBFKernel, ProductStructureKernel, GridInterpolationKernel
from gpytorch.kernels import GridKernel
from gpytorch.distributions import MultivariateNormal
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self,grid, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = ConstantMean()
self.base_covar_module = RBFKernel()
self.covar_module = GridKernel(self.base_covar_module,
grid = grid)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return MultivariateNormal(mean_x, covar_x)
lh = gpytorch.likelihoods.GaussianLikelihood()
model = GPRegressionModel(grid, train_x, train_y, lh)
# The standard method works for smaller sizes of data
cov = lh(model(train_x)).lazy_covariance_matrix
cov.inv_quad_logdet(logdet=True)
# Using a custom method we can run this on up to 1,000,000 data points
def logdet(kron, noise):
res = 0
sub_eigs = []
for lt in kron.lazy_tensors:
sub_eigs.append(lt.evaluate().eig()[0][:, 0].unsqueeze(-1))
eigs = sub_eigs[0].matmul(sub_eigs[1].t())
return torch.log(eigs + noise).sum()
noise = lh.noise.data
kron = model.covar_module.forward(train_x, train_x)
print(logdet(kron, noise))
Yeah, so this issue just boils down to the fact that it is faster to run CG on individual submatrices rather than use the matmul algorithm for the full Kronecker product to run on the full matrix.
In general you can do much better than decomposing the sub matrices because you can run CG on individual sub matrices. @mshvartsman's snippet from #725 computes inv_quad_logdet
for KroneckerProductLazyTensor
using the inv_matmul
and logdet
calls of the individual sub matrices:
def inv_quad_logdet(self, inv_quad_rhs=None, logdet=False, reduce_inv_quad=True):
nrows, ncols = inv_quad_rhs.shape
nfactors = len(self.lazy_tensors)
logdets = torch.zeros(nfactors)
tensor_shapes = [q.shape[0] for q in self.lazy_tensors]
multipliers = torch.Tensor([torch.prod(torch.Tensor([ts for i, ts in enumerate(tensor_shapes) if i!=j]))
for j in range(len(tensor_shapes))])
y = inv_quad_rhs.clone()
for i, q in enumerate(self.lazy_tensors):
n = q.shape[0]
# ideally we could just get inv_matmul and logdet from the one call instead of
# two here, reusing solves the same way we do in usual InvQuadLogDet
y = q.inv_matmul(y.contiguous().view(n, -1))
logdets[i] = q.logdet()
y = y.view(n, nrows//n, -1).permute(1, 0, 2)
logdet_term = torch.sum(logdets * multipliers)
inv_quad_term = torch.trace(inv_quad_rhs.t() @ y.contiguous().view(nrows, ncols))
return inv_quad_term, logdet_term
Which can replace the existing KroneckerProductLazyTensor.inv_quad_logdet
once it's just modified to support batch mode operations (and to not perform the logdet operation if it's not asked for).
So I think where this differs is that we want to compute the log det of K + \sigma^2 I
(not just of K
) where K
is a Kronecker. So even if KroneckerProductLazyTensor.inv_quad_logdet
is updated re: issue #725 that won't be exploited when calls are made like likelihood(model(train_x))
correct? Because then the covariance is then stored as an AddedDiagLazyTensor
?
@g-benton Right, okay I agree that in the exact GP setting you might get more efficiency out of something like a KroneckerPlusDiagLazyTensor
.
To return to the original question for this issue, what specific problems are you running in to writing a new LazyTensor? If I understand you correctly, you have an extended LazyTensor and kernel, and the only issue you are having is that likelihood(model(x))
is overwriting your lazy tensor with an AddedDiagLazyTensor(YourLazyTensor, DiagLazyTensor(noise))
.
Here's what a potentially decent solution would be:
KroneckerPlusDiagLazyTensor
(or whatever) with diag = None
initially.LazyTensor.__add__
:
https://github.com/cornellius-gp/gpytorch/blob/8cc223b6c6a108de21b1c8aa9336cd1c7c2d4779/gpytorch/lazy/lazy_tensor.py#L1581
so that if we add a DiagLazyTensor
to your KroneckerPlusDiagLazyTensor
, you get a new KroneckerPlusDiagLazyTensor
back out instead of an AddedDiagLazyTensor
.The only other consideration is ScaleKernel
. In the very short term, you may also have to handle what ConstantMulLazyTensor
currently does if you use a scale kernel, e.g. instead of using a scale kernel handle the constant in your own kernel and LT. However (cc @gpleiss), we should probably try to get away with having ConstantMulLazyTensor
use the base LT's inv_quad_logdet
. If we did this, then you'd be able to wrap your custom kernel with a ScaleKernel
.
@jacobrgardner This looks exactly like what I was looking for - I don't think scaling will be an issue for our use case so hopefully that's not a problem. Thank you!
🚀 Feature Request
I don't know if this is done already but I haven't been able to get this to work as of yet. I would like to be able to write a custom lazy tensor class with its own
inv_quad_logdet
and have that be used as the covariance matrix forlikelihood(model(train_x))
calls.Motivation
Given the structure of my data I know that my covariance matrix produced by
likelihood(model(train_x))
will decompose into a Kronecker matrix plus a scaled identity matrix (noise term) which allows for fast computations of inverse quadratic forms and log determinants (see Appendix Section 3 of http://www.cs.cmu.edu/~andrewgw/manet.pdf). The code to write a kernel wrapper class and the lazy tensor that computes fast log dets is fairly straightforward, but I'm unsure of how to connect to the GPyTorch internals so that the custom method forinv_quad_logdet
is used.This is pretty closely related to https://github.com/cornellius-gp/gpytorch/issues/725, so maybe a similar solution could be used.