Closed gpleiss closed 1 year ago
@jacobrgardner @Balandat can one of you take a look at this?
This implementation works perfectly when using one KeOps kernel, but there are a few problems when combining kernels. This is not specifically a problem with KernelLinearOperator
, but the way the things are implemented in gpytorch
/linear_operator
currently. Hopefully some of these comments are useful for structuring the code.
I tried to multiply two keops kernels (which make a ProductKernel
). These are the problems I found:
ProductKernel
calls to_dense
if x1 and x2 are not identical here and returns a DenseLinearOperator
since MulLinearOperator
is only used for square symmetric matrices currently. Since KeOps can be used for large matrices where x1 is not equal to x2, calling to_dense
could give memory errors. Instead, this should give back some object that represents the multiplication of the two symbolic matrices without actually computing them.MulLinearOpeartor
which calls root_decomposition
on both the KerneLinearOperator
s. Then there are two issues:
2.1. You might run out of memory if the matrix is very large when doing the root decomposition (because of KeOps).
2.2. If the matrix fits into memory, it will be approximated as Lanczos is the default algorithm. This is fixed by doing gpytorch.settings.fast_computations(covar_root_decomposition=False)
, but it took me a while to figure out why I was getting different results when using KeOps kernels. If using a very large KeOps symbolic matrix, then I don't think root decomposition should be called, but instead the symbolic matrix should be used until some reduction operation is performed. (So should MulLinearOperator
be used with KernelLinearOperator
instances?)I was thinking could KernelLinearOperator
subclass from DenseLinearOperator
since the KernelLinearOperator
class is used to represent a full covariance matrix (just computed on the fly when a reduction operation is applied)? That way all the checks for isinstance(self, DenseLinearOperator)
(for example this one) also work for KernelLinearOperator
. Then this goes around the problem with MulLinearOperator
but might cause other problems, so not sure if it is reasonable.
@m-julian there is unfortunately no way to do a symbolic element-wise multiplication of two kernels. KeOps (and LinearOperator) can keep things symbolic by using matrix multiplication based algorithms (CG/Lanczos) for solves and log determinants. Unfortunately, matrix multiplication does not distribute over element wise product. The current Lanczos solution comes out of the Product Kernel Interpolation paper (it was the best solution we could come up with).
Therefore, I don't know if there's a better way to handle the product of two matrices than what we currently do in code.
I was thinking could KernelLinearOperator subclass from DenseLinearOperator since the KernelLinearOperator class is used to represent a full covariance matrix (just computed on the fly when a reduction operation is applied)?
This would probably be a can of worms. DenseLinearOperator assumes that the matrix is represented by a tensor. And most of our LinearOperator classes represent full covariance matrices that are computed on the fly when reduction operations are called.
Regarding the product discussion: Regarding a general implementation that would be part of GPyTorch, I also do not know of a better approach than what @gpleiss pointed out. However, from a user's perspective, you, @m-julian, could of course write a custom KeOps kernel. Most likely, this would even be faster than two separate kernels, as you only need one trip from global memory to processor registers on the GPU.
KeOpsLinearOperator does not correctly backpropagate gradients if the covar_func closes over parameters.
KernelLinearOperator corrects for this, and is set up to replace LazyEvaluatedKernelTensor in GPyTorch down the line.
[Addresses issues in #2296]