Closed willtebbutt closed 4 years ago
Online. It turns out that you can efficiently update the Cholesky factorisation when new observations come in. @donhausk do you have a good reference for the sequential updating stuff for GPs? I'm not sure where the best place to look is.
Of course :) I have implemented this method from https://www.ucg.ac.me/skladiste/blog_10701/objava_23569/fajlovi/cholesky.pdf . If you are interested, I can send you my code including some test functions for them implementation. I reach the desired performance increase for the CPU, however, unfortunately, you need to use the lowrankupdate!
form the Linear Algebra package which does not support CuArrays.
If you could point us towards code, that would be fantastic :) Hopefully it'll be a quick job that way.
"""
update_chol!(chol, Kttq, Ktt0)
## Arguments
- chol::UpperTriangular: The cholesky decomposition of K_{0:t-1}+ I*Q
- Kttq::AbstractMatrix: The covariance matrix K_tt + I*Q
- Ktt0::AbstractMatrix: The covariance matrix Kt,0:t-1
Returns the updated cholesky decomposition of K_{0:t} + I*Q
"""
## Function for the update of the Cholesky decomposition - only increasing
function update_chol(chol::UpperTriangular, Kttq::AbstractMatrix, Ktt0::AbstractMatrix)
S21 = Ktt0/chol
S22 = cholesky(Kttq - S21*permutedims(S21,(2,1))).U
S12 = permutedims(S21,(2,1))
return UpperTriangular([ chol S12; S21 S22])
end
"""
cholesky_up_down_date!(chol, Kttful, Q, i)
## Arguments
- chol::UpperTriangular: The cholesky decomposition of K_{0:t-1,t:T - reference traj }+ I*Q
- Kttfull::AbstractMatrix: The covariance matrix K_t,0:T-1
- Q::AbstractMatrix: The process noise
- i::Int: i = t
- InputDim: This would only be relevant if we have a multi output gp.
Exchanges the i-th column and row of chol with Ktt + [0,...,0,1,0...0] Q
"""
function cholesky_up_down_date!(chol::UpperTriangular, Kttful::AbstractMatrix, Q::AbstractMatrix, i::Int, InputDim::Int = 1)
@views if i == 1
# There is no l11
# First do the downdate
## https://www.ucg.ac.me/skladiste/blog_10701/objava_23569/fajlovi/cholesky.pdf
S33 = Cholesky(chol[InputDim+1:end, InputDim+1:end],'U',0)
S23 = chol[1:InputDim, InputDim+1:end]
for idim in 1:InputDim
lowrankupdate!(S33, Array(S23[idim,:]))
end
# Now we need to do the update
S22 = cholesky(Kttful[:,1:InputDim]+Q).U
S23 = permutedims(S22,(2,1))\(Kttful[:,InputDim+1:end])
for idim in 1:InputDim
lowrankdowndate!(S33,Array(S23[idim,:]))
end
chol[:,:] = UpperTriangular([S22 S23; permutedims(S23,(2,1)) S33.U])[:,:]
else
# First do the downdate
S11 = chol[1:(i-1)*InputDim,1:(i-1)*InputDim]
S13 = chol[1:(i-1)*InputDim,i*InputDim+1:end]
S33 = Cholesky(chol[i*InputDim+1:end,i*InputDim+1:end],'U',0)
S23 = chol[(i-1)*InputDim+1:i*InputDim,i*InputDim+1:end]
for idim in 1:InputDim
lowrankupdate!(S33,Array(S23[idim,:]))
end
# Now we need to do the update
S11 = S11
A12 = permutedims(Kttful[:,1:(i-1)*InputDim], (2,1))
A22 = Kttful[:,(i-1)*InputDim+1:i*InputDim] +Q
A23 = Kttful[:,i*InputDim+1:end]
S13 = S13
S12 = permutedims(S11,(2,1))\A12
S22 = cholesky(A22 - permutedims(S12, (2,1))*S12).U
S23 = permutedims(S22,(2,1))\(A23 - permutedims(S12, (2,1))*S13)
for idim in 1:InputDim
lowrankdowndate!(S33,Array(S23[idim,:]))
end
chol[:,:] = UpperTriangular([S11 S12 S13;
permutedims(S12, (2,1)) S22 S23;
permutedims(S13, (2,1)) permutedims(S23, (2,1)) S33.U])[:,:]
end
end
Fantastic -- this provides a great starting point.
Its a bit hard coded for my use case to be fair. Don't hesitate to ask questions in case anything is unclear.
We should be able to do this in the context of both exact and approximate inference!
In the case of exact inference, we would use this to implement
posterior(fx::FiniteGP{<:PosteriorGP}, y::AbstractVector{<:Real})
efficiently. When you condition a PosteriorGP
on more observations, you should just get another PosteriorGP
conditioned on all of the observations! The code above should be a good starting point for this bit.
I think the same makes sense for approximate conditioning, we would just need to implement
approx_posterior(::VFE, fx::FiniteGP{<:ApproxPosteriorGP}, y::AbstractVector{<:Real})
which should only be a little different. @donhausk did you ever think about this with pseudo points? I may be misunderstanding the code above, but it looks like it's tailored to exact inference.
Yes you are right, this is a good point. For my code, i am using the variational sparse GP approximation by Titsias. In that case, you don't need the Cholesky update as any two outputs are independent give the inducing points. Unfortunately, i have not implemented the case for the FITC sparse approximation. For a reference, take a look at the end of the 4th chapter from Frigolas PHD thesis http://www.rogerfrigola.com/doc/thesis.pdf .
Besides, for the variational sparse GP, the optimal sparse approximation is usually used. I am not sure if you need this but I could additionally send you an implementation of the ELBO in cases where you do not use the actual optimal sparse approximation but any multi variate normal distribution for the inducing points.
@donhausk we've just released version 0.1.1 that contains sequential conditioning. If you've got a spare 10 minutes, it would be great if you could try it out and let us know how you find it. i.e. try typing ?posterior
and see if it's clear what's going on. Would be keen to know if there's anything that could be clearer / improved.
While I remember: the current interface doesn't make it easy / efficient to compute both the posterior and the log marginal likelihood at the same time. This is something that would be really desirable in practice (I think?), so we should make it easy. One option is to just return both the posterior process and log marginal likelihood from posterior
.
Something analogous should happen for approximate_posterior
.
@willtebbutt Can we close this now that we have sequential conditioning for exact and approximate posterior?
We can indeed. Great job on this @sharanry !
Do you mean something like a mini-batch or an online learning framework for GPs?