cornellius-gp / linear_operator

A LinearOperator implementation to wrap the numerical nuts and bolts of GPyTorch
MIT License
93 stars 28 forks source link

Add KernelLinearOperator, deprecate KeOpsLinearOperator #62

Closed gpleiss closed 1 year ago

gpleiss commented 1 year ago

KeOpsLinearOperator does not correctly backpropagate gradients if the covar_func closes over parameters.

KernelLinearOperator corrects for this, and is set up to replace LazyEvaluatedKernelTensor in GPyTorch down the line.

[Addresses issues in #2296]

gpleiss commented 1 year ago

@jacobrgardner @Balandat can one of you take a look at this?

m-julian commented 1 year ago

This implementation works perfectly when using one KeOps kernel, but there are a few problems when combining kernels. This is not specifically a problem with KernelLinearOperator, but the way the things are implemented in gpytorch/linear_operator currently. Hopefully some of these comments are useful for structuring the code.

I tried to multiply two keops kernels (which make a ProductKernel). These are the problems I found:

  1. ProductKernel calls to_dense if x1 and x2 are not identical here and returns a DenseLinearOperator since MulLinearOperator is only used for square symmetric matrices currently. Since KeOps can be used for large matrices where x1 is not equal to x2, calling to_dense could give memory errors. Instead, this should give back some object that represents the multiplication of the two symbolic matrices without actually computing them.
  2. If x1 is equal to x2, then you end up with a MulLinearOpeartor which calls root_decomposition on both the KerneLinearOperators. Then there are two issues: 2.1. You might run out of memory if the matrix is very large when doing the root decomposition (because of KeOps). 2.2. If the matrix fits into memory, it will be approximated as Lanczos is the default algorithm. This is fixed by doing gpytorch.settings.fast_computations(covar_root_decomposition=False), but it took me a while to figure out why I was getting different results when using KeOps kernels. If using a very large KeOps symbolic matrix, then I don't think root decomposition should be called, but instead the symbolic matrix should be used until some reduction operation is performed. (So should MulLinearOperator be used with KernelLinearOperator instances?)

I was thinking could KernelLinearOperator subclass from DenseLinearOperator since the KernelLinearOperator class is used to represent a full covariance matrix (just computed on the fly when a reduction operation is applied)? That way all the checks for isinstance(self, DenseLinearOperator) (for example this one) also work for KernelLinearOperator. Then this goes around the problem with MulLinearOperator but might cause other problems, so not sure if it is reasonable.

gpleiss commented 1 year ago

@m-julian there is unfortunately no way to do a symbolic element-wise multiplication of two kernels. KeOps (and LinearOperator) can keep things symbolic by using matrix multiplication based algorithms (CG/Lanczos) for solves and log determinants. Unfortunately, matrix multiplication does not distribute over element wise product. The current Lanczos solution comes out of the Product Kernel Interpolation paper (it was the best solution we could come up with).

Therefore, I don't know if there's a better way to handle the product of two matrices than what we currently do in code.

I was thinking could KernelLinearOperator subclass from DenseLinearOperator since the KernelLinearOperator class is used to represent a full covariance matrix (just computed on the fly when a reduction operation is applied)?

This would probably be a can of worms. DenseLinearOperator assumes that the matrix is represented by a tensor. And most of our LinearOperator classes represent full covariance matrices that are computed on the fly when reduction operations are called.

Turakar commented 1 year ago

Regarding the product discussion: Regarding a general implementation that would be part of GPyTorch, I also do not know of a better approach than what @gpleiss pointed out. However, from a user's perspective, you, @m-julian, could of course write a custom KeOps kernel. Most likely, this would even be faster than two separate kernels, as you only need one trip from global memory to processor registers on the GPU.