JuliaGaussianProcesses / Stheno.jl

Probabilistic Programming with Gaussian processes in Julia
Other
339 stars 26 forks source link

restrict observations to a specific (sub-) process #176

Open andreaskoher opened 3 years ago

andreaskoher commented 3 years ago

Hey @willtebbutt ,

again a really cool package and sorry for bothering you a lot these days. I am trying to estimate the day of week effect in a time series. For the sake of an example let's assume the data was generated from the process below:

x_₃ = x_₁ = 1. : 365.
x_₂ = zeros(length(x_₃))
x_₂[1:7:end] .= 1 # there is some effect every, say Monday

effect = 1.

f = @gppp let
    f₁ = GP( Matern52Kernel() ∘ ScaleTransform(1/100) )
    f₂ = GP( effect * LinearKernel() )
    f₃ = f₁ + f₂
end

x₁ = GPPPInput(:f₁, x_₁)
x₂ = GPPPInput(:f₂, x_₂)
x  = BlockData(x₁, x₂)

fx = f(x, 1e-3)
y₁, y₂ = split(x, rand(fx))
y₃ = y₁ + y₂
plot(x_₃, y₃)

dayofweekeffect

In order to fit hyper parameters, I have to evaluate logpdf(fx, y₃). This fails of course because it expects observations for the latent process f₂ as well as f₃. Is there some way to restrict observations to f₃ only?

willtebbutt commented 3 years ago

again a really cool package and sorry for bothering you a lot these days

Thanks. It's no bother at all -- I'm keen to be told all of the things that don't work as intended / about anything that's unclear!

Is there some way to restrict observations to f₃ only?

Yes :) If you just replace fx = f(x, 1e-6) with fx = f(x₃), that should do it.

You'll not need the split function anymore, as you're only returning samples from a single process from f. Soy = rand(fx) should be fine.

The over-arching point here is that both GPPPInputs and BlockData are acceptable input vectors for a GPPP. a GPPPInput will just get you points in a single process from f, whereas a BlockData lets you get multiple processes at once.

Would I be correct in assuming that it would be helpful if I clarified this in the example / docs?

andreaskoher commented 3 years ago

Many thanks for the quick response! and sorry for the badly explained example (I updated it slightly, especially ConstantKernel -> LinearKernel). I also think the docs are good and I feel that I understood GPPPInput and BlockData, but lets see :)

So assuming that I have only (x_₃, y₃) as an observable time series and I further assume that it can be decomposed into a smooth underlying part y₁ and a constant (unknown) effect for one particular day of the week, say Monday. As covariates, I have the day of year x_₃ = x_₁ and a boolean vector that indicates each Monday x_₂. Now, my goal is to decompose y₃ into y₁ and y₂, i.e. estimate the Monday effect and find the underlying smooth function.

Feeding only x_₃ into f without the covariates x_₂ that indicate the special day, would not give the correct posterior:

x₃ = GPPPInput(:f₃, x_₃)
fx = f(x₃, 1e-3)
f′ = posterior(fx, y₃)
plot(x_₃, rand( f′(x₃, 1e-9) ))

dayofweekeffect_posterior

willtebbutt commented 3 years ago

Ahhh I see. Sorry, I missed the detail about x_1 and x_2!

To make sure that I'm understanding properly, would it be correct to say that the function you're after is something like:

f(day_of_year, is_monday) = f_1(day_of_year) + f_2(is_monday)

?

andreaskoher commented 3 years ago

Yes that's it :) (and I just realized that I haven't updated the example) So it is two input and one output dimensions in this example. It is also similar to cover example from the textbook Bayesian Data Analysis 3 (p. 505-509) where Aki Vehtari used GPStuff to decompose a time series and here x_₂ would indicate special holidays.

willtebbutt commented 3 years ago

Just taken a look at the example -- looks like he's chopping it up by multipling with another function.

The way I would think to do this would be to use the multiply-by-the-function formulation that's use in BDA, and to write something like

using AbstractGPs
using Plots
using Stheno

# The points in time under consideration.
x = 1:365

# Create is_monday function. Let's pretend these actually correspond to Mondays.
is_monday(t) = rem(t, 7) == 0 ? 1 : 0

effect_size = 0.1

f = @gppp let
    f₁ = GP(Matern52Kernel() ∘ ScaleTransform(1/100))
    f₂ = is_monday * (effect_size * GP( ConstantKernel() ))
    f₃ = f₁ + f₂
end;

# Compute the posterior given observations only of f₃.
x₃ = GPPPInput(:f₃, x)
fx = f(x₃, 0.01)
y₃ = rand(fx)
f_post = posterior(fx, y₃)

# Compute the posterior marginals over the sum of the two processes.
x3 = GPPPInput(:f₃, x)
marginals(f_post(x3))

# Compute the posterior marginals over the effect size, f₂.
x2 = GPPPInput(:f₂, x)
marginals(f_post(x2))

# Compute the posterior marginals over the other latent process.
x1 = GPPPInput(:f₁, x)
marginals(f_post(x1))

# Plot everything.
plot(x, f_post(x3, 1e-6); color=:blue, ribbon_scale=3)
plot!(x, f_post(x2, 1e-6); color=:green, ribbon_scale=3)
plot!(x, f_post(x1, 1e-6); color=:red, ribbon_scale=3)
scatter!(x, y₃; color=:black, markersize=1)

Does this do what you need?

Actually, if you're doing this, would you be up for contributing it to the examples section when you're done, with a reference to those pages of BDA?

edit: updated kernel + effect size + observation noise variance to make the visuals nicer.

andreaskoher commented 3 years ago

Yes that's it, really nice! More abstractly the solution is of the form

f(x) = f₁(x) + I(x)f₂(x) where I(x) is an indicator function and f₂ is a constant kernel GP.

This directly corresponds to the text book example. Interestingly, however, the authors implementation with GPStuff (Part b) is more like your comment above, i.e.

f(x₁,x₂) = f₁(x₁) + f₂(x₂) where f₂ is a GP with a linear kernel and x₂ is a boolean vector.

The latter approach seems somewhat more general but, in any case, your approach solves my use case and the text book example.

Thanks for that and I will be happy to add an example to Stheno.

willtebbutt commented 3 years ago

The latter approach seems somewhat more general but, in any case, your approach solves my use case and the text book example.

Excellent. Yeah, it's an interesting one. On some level, the additive implementation I previously proposed / that is used in GPStuff feels like what you would do if you could not, for some reason or another, implement it using the indicator function route. I guess they're equivalent though, 🤷

Thanks for that and I will be happy to add an example to Stheno.

Excellent, I look forward to it :)