cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.53k stars 556 forks source link

[Feature Request] Add Laplacian Kernel #1801

Open MartinBubel opened 2 years ago

MartinBubel commented 2 years ago

🚀 Feature Request

I would like to request adding the Laplacian kernel (and maybe even more kernel/covariance functions) to gpytorch.
$$ k_{\text{Laplacian}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( - \left(\mathbf{x_1} - \mathbf{x_2}\right)^\top \Theta^{-1} \left(\mathbf{x_1} - \mathbf{x_2}\right) \right) $$.

Motivation

Different kernel/covariance functions often are not equally suited for a given dataset. For example, the Laplacian kernel, as described by the above equation, has been claimed being well-suited for working with "curve-shaped" datasets, see. e.g. Bauckhage et al. (the paper unfortunately is not open-access).
I have done some minimum-example comparisons, e.g. comparing the Radial Basis Function kernel and the Laplacian kernel as the covariance function for an Approximate Gaussian Process in a classification task.

Pitch

I forked gpytorch and added the Laplacian kernels based on the existing implementation of the gpytorch.kernels.RBFKernel (as both kernel functions are quite similar). If adding the Laplacian kernel to gpytorch is considered helpful, I would like to contribute a pull request.

gpleiss commented 2 years ago

This seems very similar to the RBF kernel. Is the only difference that the inputs are scaled by the matrix \Theta? I'm a little hesitant of adding something so similar to the RBF kernel. (I have also not seen this name for this kernel in the GP literature - is there anything you can point me to?)

MartinBubel commented 2 years ago

Yes, it is very similar. The main difference is that the lengthscale is not applied quadratically. I wrote down the formulas more nicely in the notebook linked below.
To my knowledge, there is no GP-literature that covers the Laplacian kernel. However in the paper referenced in the above feature request, covering the topic of kernelized minimum enclosing balls for data clustering, the Laplacian kernel was claimed (without a literature ref) being well-suited for curve-shaped data (e.g. half-moon clustered data as in sklearn.datasets.make_moons ). Motivated by that paper, I compared the RBF covariance function and the Laplacian covariance function in a classification task using Approximate Gaussian Processes. I found that the characteristic function obtained from the AGP using the Laplacian kernel is a lot more intuivite. I visualized it in a jupyter notebook in a forked version of gpytorch, see here.
I of course understand that this feature might not be of much interest for gpytorch, given the fact that Laplacian kernel/covariance functions seems to be have been used in GP-literature at all.

gpleiss commented 2 years ago

If we were going to add something to the library, I think it would be best to modify the core lengthscale functionality. Note that the docs for the base kernel class define the lengthscale to be a matrix (much in the way that you've defined it):

Screen Shot 2021-10-27 at 7 14 20 AM

Of course, the only options that we have implemented right now are that \Theta is

  1. the identity,
  2. a constant times the identity, or
  3. a diagonal matrix.

We could add a 4th option that allows it to be any arbitrary positive definite matrix (presumably parameterized by a Cholesky factor or something). Then RBF kernel would do exactly what the Laplace kernel is doing.

(Also note that the docs for the RBF kernel are written in this generic way, where - if there were a full-covariance option for the lengthscale - it would be exactly equal to the Laplace kernel.)