JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Sum of independent kernels #506

Open martincornejo opened 1 year ago

martincornejo commented 1 year ago

Following this discourse discussion. Currently, there is no building block to sum independent Kernels, analog to KernelTensorProduct but with addition instead of multiplication:

For inputs $x = (x_1,\dots,x_n)$ and $x' = (x_1',\dots,x_n')$, the independent sum of kernels $k_1, \dots, k_n$:

$$ k(x, x'; k_1, \dots kn) = \sum{i=1}^n k_i(x_i, x_i') $$

theogf commented 1 year ago

It sounds like a reasonable composite to add, especially since the alternative is pretty ugly using SelectTransform, is there a standardized name for this kind of kernels? KernelTensorSum? KernelDimensionwiseSum ?

martincornejo commented 1 year ago

I am willing to do a PR, but I'll need some guidance since it is my first contribution. My naive approach would be to create a new Kernel (I like the idea of KernelTensorSum) similar to KernelTensorProduct.

What are the requirements for a fully functional kernel that can be used in AbstractGPs? From the documentation I identify the following:

theogf commented 1 year ago

Thanks for contributing, I think you got it all together!

I would not build a KernelTensor abstraction as I don't think we would get much out of it.

For the name we can still change it during the PR review time if some other arguments come up.

devmotion commented 1 year ago

What are the requirements for a fully functional kernel that can be used in AbstractGPs?

I guess you can mainly copy KernelTensorProduct and replace multiplication with addition.

would an abstract type KernelTensor for both KernelTensorProduct and KernelTensorSum make sense?

Not in an initial version IMO (and maybe not at all). I would add a separate type, similar to how we distinguish between KernelSum and KernelProduct.

martincornejo commented 1 year ago

One technical question. Is it always the case that if you compare the same input, the correlation should be 1?

julia> k = SqExponentialKernel();

julia> x = 0;

julia> k(x,x)
1.0

Independently adding the kernels results into the following behavior:

julia> k1 = SqExponentialKernel();

julia> k2 = ExponentialKernel();

julia> k = KernelTensorSum(k1, k2)
Tensor sum of 2 kernels:
        Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
        Exponential Kernel (metric = Distances.Euclidean(0.0))

julia> x = zeros(2);

julia> k(x,x)
2.0

So, should the kernel take the mean instead of sum so the correlation is normalized?

For inputs $x = (x_1,\dots,x_n)$ and $x' = (x_1',\dots,x_n')$, the independent sum of kernels $k_1, \dots, k_n$:

$$ k(x, x'; k_1, \dots kn) = \frac{1}{n} \sum{i=1}^n k_i(x_i, x_i') $$

theogf commented 1 year ago

No it does not have to be! I would not "normalize" cause that might be something unexpected from the user side. The scaling should be dealt with each kernel individually.

martincornejo commented 1 year ago

Of course... Simply scaling a kernel would also mean k(x,x) != 1.0