JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
267 stars 32 forks source link

Implementing non-stationary gibbs kernel #372

Open Cyberface opened 3 years ago

Cyberface commented 3 years ago

Hi,

I'm new to Julia and Gaussian processes but recently came across a couple of interesting papers about non-stationary spectral mixture kernels [1], [2]. I can see that you have apready implemented a spectral mixture kernel here (As a side note I couldn't figure out how to get that to work)

I'm interested in trying to implement this but starting off small so just try to implement one part of this first which would be the input-dependent lengthscale kernel called the Gibbs kernel [3] which can be more easily see in [1] just before equation 7. (Just found another reference also here [4]).

I just took a screenshot from a talk that describes this so you can clearly see the difference between the typical RBF type kernel and this variable length-scale kernel.

kernel

Basically if you take the length-scale parameter from a Squared-Exponential Kernel and turn this into a function that depends on the input coordinates then you get a non-stationary kernel called the "Gibbs kernel".

The variable length-scale kernel has a learnable function \ell(x) which in [1] they parameterise by another GP.

Can anyone help me implementing this? From reading how KernelFunctions works this looks like it could be done easily using some kind of Transforms?

Thanks in advance for any help!

[1]: Non-Stationary Spectral Kernels [2]: Neural Non-Stationary Spectral Kernel [3]: Gibbs 1997 [4]: Nonstationary Covariance Functions for Gaussian Process Regression (NIPS 2004)

willtebbutt commented 3 years ago

Welcome :) I'm keen to see this functionality made available too, so thanks for opening the issue.

Yes, this should be straightforward to do. The Gibbs kernel is an interesting example, because both the original data x and the transformed data l(x) appear in the kernel, and I don't believe there's a way to turn it into a kernel which just depends on the transformed data. @wesselb and I have discussed this occassionally, so maybe he can say if I'm missing something?

Consequently, my inclination would be to say that the best way to go would be to create a new kernel which is parametrised in terms of l. A good example to build off would be something like ScaledKernel since it implements the functions that would need to be implemented.

Would you be interested in opening a PR to implement this?

Cyberface commented 3 years ago

Hi @willtebbutt thanks for replying!

Yeah I'm interested in trying to get this done for sure. I mainly work in python and only recently started looking at Julia so I will need some hand holding to get this done if you are OK with that.

Using the ScaledKernel is a good idea since the l(x) terms can certainly be written as a couple of multiplicative factors. I guess one thing I'm confused about straight away is what datatype shouldl(x) because this is a function that needs to be fit/approximated. Which, could be approximated using whatever method you like e.g. polynomials, GPR, neural nets etc.

Also I'm fairly new to thinking about GPs so appologies if a say something stupid :)

But like I said I'm very happy to give this a go.

willtebbutt commented 3 years ago

Using the ScaledKernel is a good idea since the l(x) terms can certainly be written as a couple of multiplicative factors. I guess one thing I'm confused about straight away is what datatype shouldl(x) because this is a function that needs to be fit/approximated. Which, could be approximated using whatever method you like e.g. polynomials, GPR, neural nets etc.

Ahh, sorry, I should have been clearer. I wasn't thinking so much about the mathematical relationship with the ScaledKernel, just that we need to implement the same set of functions on a new GibbsKernel type.

I guess one thing I'm confused about straight away is what datatype shouldl(x) because this is a function that needs to be fit/approximated.

The kind of type was imagining was something like

struct GibbsKernel{Tl} <: Kernel
    l::Tl
end

so letting the type of l be any type the user likes. We would just stipulate in the docstring or whatever that it must be a function mapping to a scalar, and its the user's responsibility to ensure that this happens in practice.

I mainly work in python and only recently started looking at Julia so I will need some hand holding to get this done if you are OK with that.

Very happy to help. The most important thing is to keep the PRs (in particular the first) as small as possible so that we can work through the process of getting you acquainted with the contribution process as efficiently as possible so that future PRs proceed smoothly -- this can be hard to do if the PR is large. Things generally go much more smoothly if we use a number of small PRs rather than a single big PR.

For example, I would suggest a first PR would contain the above type, an implementation of its evaluation function -- something with signature

function (k::GibbsKernel)(x, y)
    # code to implement the Gibbs kernel
end

-- and associated tests. Having said that the ScaledKernel is a good thing to build off, now that I think about it I would actually avoid implementing kernelmatrix etc for now, as we get slow versions of them for free by defining the evaluation function, and they can just be implemented as optimisations in future PRs.

The ScaledKernel tests should give you an idea of the tests that need implementing. Lines 12-16 are standardised tests that should probably also be run on the GibbsKernel (maybe with different inputs though?), whereas the preceeding lines are ScaledKernel-specific tests, so you'd need to construct different tests here. It might be an idea, for example, to check that the GibbsKernel relates to the squared exponential kernel when l is the identity function, or something like that.

Cyberface commented 3 years ago

Hey @willtebbutt, thanks for the pointers!

I have added a PR with a very basic first go at this.

I don't use the proper kernel infrastructure yet and think I will need to some understanding that and thinking of the best way to implement the kernel properly.

I actually think that putting the gibbs kernel into "src/basekernels" might be best because it's not a simple transform of the squared exponential right?

But I'm not so used to thinking about this so looking forward to your thoughts.

Best

willtebbutt commented 3 years ago

Many thanks for opening the PR -- it shouldn't need many tweaks before it can be merged.

I actually think that putting the gibbs kernel into "src/basekernels" might be best because it's not a simple transform of the squared exponential right?

Good idea. We can move things around before merging your PR.

willtebbutt commented 3 years ago

Now that #374 is in, what would you like to work on next @Cyberface ?

Cyberface commented 3 years ago

That is really cool! Thanks for the super helpful discussions on the PR!

I guess next thing to do is to get kernelmatrix working with it?

Ultimately I wanted to get to a point where we can model the gibbs kernel length scale can be modelled with a separate GP and fit/optimise that.

Cyberface commented 3 years ago

So I guess copy and edit some of the code from the ScaledKernel.jl like you suggested before?

willtebbutt commented 3 years ago

I guess next thing to do is to get kernelmatrix working with it?

Well kernelmatrix will already work (there's a fallback definition that you get for free by implementing the kernel), it's just that it's probably possible to optimise the implementation by implementing kernelmatrix directly.

Ultimately I wanted to get to a point where we can model the gibbs kernel length scale can be modelled with a separate GP and fit/optimise that.

Nice. I look forward to seeing that. Always good to see a deep GP!

So I guess copy and edit some of the code from the ScaledKernel.jl like you suggested before?

Yeah -- that should provide a good template on which to build.

st-- commented 2 years ago

Would be great to extend the GibbsKernel to work with any (stationary) base kernel. Effectively it's just changing the metric! (Though it might be simpler to implement it as a wrapped kernel, i.e. add a kernel field to the GibbsKernel struct and use that instead of a hard-coded SqExponentialKernel().)

Cyberface commented 2 years ago

I'm more than happy to give this a go! ... with some hand holding :)

I quickly put together something along the lines of what you suggested but I'm sure I've done it wrong haha

https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/pull/406

I'm keen to work on this but don't have as much time these days so you will have to excuse me if I'm slow in replying!