JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Documenting caveats of `SimpleKernel` #520

Open sethaxen opened 1 year ago

sethaxen commented 1 year ago

The docs recommend SimpleKernel for building ones own kernel using Distances.jl. They should probably here also note that not every PreMetric yields a positive-definite kernel.

In particular, as Theorem 1 of https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Feragen_Geodesic_Exponential_Kernels_2015_CVPR_paper.pdf notes, the geodesic distance for any "non-flat" manifold does not yield a positive-definite kernel when used in a squared exponential kernel, which would mean e.g. Distances.SphericalAngle will not yield a PD kernel.

willtebbutt commented 1 year ago

This is a really good point -- I think this was raised when we added this feature, but we should have probably made it clearer that you can't just use any metric on any space. It would be worth pointing out that the default should all be fine (I think they all assume Euclidean space).

sethaxen commented 1 year ago

It would be worth pointing out that the default should all be fine (I think they all assume Euclidean space).

Which default? Whether or not the kernel is PD seems to not be related to whether the space is assumed to be Euclidean. It comes down to whether the distance is a geodesic distance on some manifold, represented in the specified coordinates. For the case of Distances.SphericalAngle, one could extend the domain of the inputs to cover $\mathbb{R}^2$, but the distance still will work out to the geodesic distance $d_{\mathrm{S}^2}(f(x_1), f(x_2))$, where $f$ is the map from $\mathbb{R}^2$ to the unit vectors in $\mathbb{R}^3$. Which means the kernel is not positive-definite.

So the tricky thing is that any PreMetric used in Distances.jl that corresponds to the geodesic distance on some non-Euclidean manifold given some chart on that manifold and some choice of Riemannian metric will with squared-exponential not give valid metric. And so it seems in general one should assume that no PreMetric yields a valid kernel unless one has a proof that it does.

willtebbutt commented 1 year ago

Which default?

Sorry, I'm just saying that, for example

julia> SEKernel()
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))

defaults to the usual notion of distance in Euclidean space. I agree that if you construct it with most other metrics, you'll get something silly.

And so it seems in general one should assume that no PreMetric yields a valid kernel unless one has a proof that it does.

I tend to agree -- I'm not convinced introducing this feature was a win. It created loads of ways to constructs things that aren't PD, and didn't introduce much flexibility

sethaxen commented 1 year ago

Okay, so the simplest change right now is to simply document in SEKernel and the section on SimpleKernel the warning that most metrics will in general not produce a PD kernel. A more complicated change would be to remove the option for a user to provide a PreMetric to SEKernel entirely.

devmotion commented 1 year ago

IIRC it was discussed before in some other issue (I'll try to find it) that you cannot just use any metric. I think this is a well-known fact about kernels but maybe it would indeed be good to emphasize it in the docs.

devmotion commented 1 year ago

https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/159#issuecomment-696619214

willtebbutt commented 1 year ago

I think docs are probably the way forwards here. Maybe the main docs need a note, and each of the docstrings for relevant kernels? (presumably best achieved by having a const String somewhere that we just interpolate into the relevant docstrings?)