JuliaGaussianProcesses / KernelFunctions.jl

Julia package for kernel functions for machine learning
https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/
MIT License
266 stars 32 forks source link

`PeriodicKernel` does not work with AD (see issue for work-around) #389

Closed david-vicente closed 9 months ago

david-vicente commented 3 years ago

Hi. I noticed that the PeriodicKernel is failling AD tests and I would like to implement the Mauna Loa GPR example. I tried looking for discussion regarding this issue in past Issues but there isn't much. Is the problem an instance of "it is doable but no one has had the time yet" or is it pending because of stuff that depends on AD packages? Thanks!

theogf commented 3 years ago

I just reran the AD tests and it looks like using ForwardDiff or ReverseDiff is fine. However there are still some strong errors with Zygote. I think #386 should hopefully solve this problem.

Note that our PeriodicKernel is only equal to the ExpSineKernel from sci-kit learn for one dimension : vs

david-vicente commented 3 years ago

Why isn't there a period length parameter as in gpytorch?

theogf commented 3 years ago

Because this would just be the lengthscale. You could achieve the same result with:

p = [1.0, 2.0, 1.5]
k = with_lengthscale(PeriodicKernel(;r=rand(3)), p)

where p is the same as in the gpytorch definition

The r here, corresponds to the sqrt(lambda) in gpytorch

st-- commented 3 years ago

Oh, I was independently trying to build the MaunaLoa example (I put the work-in-progress here: https://github.com/JuliaGaussianProcesses/EcosystemIntegrationTesting/blob/main/api_testing/mauna_loa_example/script.jl as a Pluto.jl notebook) and ran into the same bug.

Here's code to reproduce the bug:

using AbstractGPs
using ParameterHandling
function build_gp(th)
    return GP(th.s^2 * with_lengthscale(PeriodicKernel(; r=[th.l/2]), th.p))
end
th = (; s = positive(exp(0.0)), l=positive(exp(1.0)), p=positive(exp(0.0)))
thflat, unflatten = ParameterHandling.value_flatten(th)
x = rand(5)
y = rand(5)
function loss(th)
    f=build_gp(th)
    return -logpdf(f(x, 0.01), y)
end
loss_packed = loss ∘ unflatten
using Zygote
Zygote.gradient(loss_packed, thflat)
st-- commented 3 years ago

@david-vicente the following two lines give equivalent kernels (in 1D):

k1 = with_lengthscale(PeriodicKernel(; r=[wiggle_scale / 2]), period)
k2 = with_lengthscale(SqExponentialKernel(), wiggle_scale) ∘ PeriodicTransform(1/period)

and we might want to provide with_period(kernel, period) = kernel ∘ PeriodicTransform(1/period). AutoDiff works fine for the PeriodicTransform. Using the second form, you can also swap out the base kernel, e.g. for a Matern32Kernel:)

(Note that wiggle_scale is relative to a period, not relative to the actual input scale!)

david-vicente commented 3 years ago

Thank you @st--. I did look to the PeriodicTransform at the time, but it wasn't obvious how to use it to achieve equivalent kernels from the documentation. Maybe we should add

k1 = with_lengthscale(PeriodicKernel(; r=[wiggle_scale / 2]), period)
k2 = with_lengthscale(SqExponentialKernel(), wiggle_scale) ∘ PeriodicTransform(1/period)

to its documentation?

st-- commented 3 years ago

@david-vicente I've just added the finished example to AbstractGPs: https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/pull/240

Yes it'd be great to improve the documentation. Would you be up for making the change and opening a PR? :)

st-- commented 3 years ago

You could also add the with_periodic helper I suggested, if you wanted:)

david-vicente commented 3 years ago

Can we do the same for multi-dimensional inputs?

devmotion commented 2 years ago

Can we do the same for multi-dimensional inputs?

It would work in exactly the same way if we would make PeriodicTransform output complex numbers (using cispi and, on older Julia versions, cis).

simsurace commented 9 months ago

Fixed by #531