JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
266 stars 32 forks source link

Kernels for derivatives of Gaussian processes #298

Open elevien opened 3 years ago

elevien commented 3 years ago

It is sometimes useful (at least in my own work) to make posterior predictions of the derivative of a process or to make predictions of a process based on observations of the derivatives (see e.g. Rasmussen and Williams 9.4 or this paper). For these purposes, it would be nice to have some interface for associating partial derivatives with respect to kernel arguments (not parameters) with a kernel and applying transformations to a kernel and its derivatives. I'm imagining some sort of DifferentiableKernel type, but no idea if this is the best way to go about it. Any thoughts on how to approach this and do people think it is of wide enough interest to be implemented here?

Thanks in advance! I've enjoyed KernelFunctions so far, and look forward to watching it develop.

theogf commented 3 years ago

Hi @elevien, computing the partial derivatives should be possible out of the box and this is something we are already testing in CI. Implementing an automatic system would definitely be interesting and would involve transforming a normal kernel into a (potentially) multi-output one (I think). The easiest thing to do would be to define a new MOKernel and just define k(x,y) but I am not a multi-output kernel expert. Maybe @willtebbutt has a different idea?

willtebbutt commented 3 years ago

This is one of those things that I suspect we might be better off implementing directly in Stheno, rather than as a kernel in kernel functions, because there are quite a few permutations of things that you might want to express. In particular you might want to predict the function and / or the derivatives, and you might condition on the function and / or the derivatives -- if we were to do this in KernelFunctions, I suspect that we would wind up needing quite a number of kernels to support the various permutations. Conversely, implementing this in Stheno would just involve a single additional linear transform, and we would get all of the permutations for free.

I made a very hacky start here (it was restricted to the Exponentiated Quadratic kernel -- we'd probably want something more general), and would be happy to discuss further -- I started this before we really got our story straight for multi-output processes so was only able to express single-input problems -- but we should be able to handle multi-input problems fairly straightforwardly now . We'd need to think a bit more about how it would play with AD / how much we want to hand-code etc.

elevien commented 3 years ago

Hi @theogf, thank you for the quick reply!

Hi @elevien, computing the partial derivatives should be possible out of the box and this is something we are already testing in CI.

Could you elaborate on this? To be concrete, say I start with a squared exponential kernel k(x,y). If I want to make a prediction of the derivative of a process using GP priors, I'll need to evaluate k_x(x,y) = -2*e^(-(x-y)^2)*(x-y) (the partial derivative with respect to the first argument). Now if I compose k(x,y) with some function g, I want k_x(x,y) to be transformed in a way that is consistent with the chain rule. I understand I can use AD directly to get the transformed k_x(x,y) after composition with g, however, as a matter of principle and efficiency, I like having these simple known derivatives hard-coded. Maybe this is misguided? (also when using AD I had some issues evaluating at x=y, but that's off-topic.)

The easiest thing to do would be to define a new MOKernel and just define k(x,y) but I am not a multi-output kernel expert. Maybe @willtebbutt has a different idea?

Haven't checked out MOKernel at all, but I see how that could be useful. Is it an issue that the partial derivatives are not always kernels themselves?

EDIT: looking at @willtebbutt's code I'm starting to see how this can be done, although I'm not really familiar with Stheno. I'll digest this and let you know if I have any questions. Thanks!

theogf commented 3 years ago

Also @niklasschmitz did some work with this, maybe he already has an implementation

elevien commented 3 years ago

Wanted to follow up on this. I've been looking at Stheno.jl (really enjoying it so far!) and will post questions about the derivative implementations over there. But for the moment, if I'm simply looking to take the derivative of a 1D Gaussian process, can I just hard code my kernels and their derivatives into a custom MOKernel, throw these into a gppp, and explicitly implement transformations to ensure the chain rule is satisfied?

willtebbutt commented 3 years ago

But for the moment, if I'm simply looking to take the derivative of a 1D Gaussian process, can I just hard code my kernels and their derivatives into a custom MOKernel, throw these into a gppp, and explicitly implement transformations to ensure the chain rule is satisfied?

You could do this. If you're going to go with the kernel-based approach, I wouldn't bother with Stheno.jl though. If you're just doing a 1D GP, then the PR I linked above should provide a pretty reasonable guide to implementing this inside Stheno.jl , albeit with old types. (You'd want to rename GP -> AbstractGP, EQ -> SEKernel etc).