JuliaGaussianProcesses / AbstractGPs.jl

Abstract types and methods for Gaussian Processes.
https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev
Other
218 stars 20 forks source link

Implement sequential conditioning #1

Closed willtebbutt closed 4 years ago

sharanry commented 4 years ago

Do you mean something like a mini-batch or an online learning framework for GPs?

willtebbutt commented 4 years ago

Online. It turns out that you can efficiently update the Cholesky factorisation when new observations come in. @donhausk do you have a good reference for the sequential updating stuff for GPs? I'm not sure where the best place to look is.

donhausk commented 4 years ago

Of course :) I have implemented this method from https://www.ucg.ac.me/skladiste/blog_10701/objava_23569/fajlovi/cholesky.pdf . If you are interested, I can send you my code including some test functions for them implementation. I reach the desired performance increase for the CPU, however, unfortunately, you need to use the lowrankupdate!form the Linear Algebra package which does not support CuArrays.

willtebbutt commented 4 years ago

If you could point us towards code, that would be fantastic :) Hopefully it'll be a quick job that way.

donhausk commented 4 years ago
"""
     update_chol!(chol, Kttq, Ktt0)

## Arguments

     - chol::UpperTriangular:    The cholesky decomposition of K_{0:t-1}+  I*Q
     - Kttq::AbstractMatrix:     The covariance matrix K_tt + I*Q
     - Ktt0::AbstractMatrix:     The covariance matrix Kt,0:t-1

Returns the updated cholesky decomposition of K_{0:t} + I*Q

"""

## Function for the update of the Cholesky decomposition - only increasing
function update_chol(chol::UpperTriangular, Kttq::AbstractMatrix, Ktt0::AbstractMatrix)
    S21 = Ktt0/chol
    S22 = cholesky(Kttq - S21*permutedims(S21,(2,1))).U
    S12 =  permutedims(S21,(2,1))
    return UpperTriangular([ chol S12; S21 S22])
end

"""
     cholesky_up_down_date!(chol, Kttful, Q, i)

## Arguments

     - chol::UpperTriangular:       The cholesky decomposition of K_{0:t-1,t:T - reference traj }+  I*Q
     - Kttfull::AbstractMatrix:     The covariance matrix K_t,0:T-1
     - Q::AbstractMatrix:           The process noise
     - i::Int:                      i = t
     - InputDim:                    This would only be relevant if we have a multi output gp.

Exchanges the i-th column and row of chol with Ktt + [0,...,0,1,0...0] Q

"""

function cholesky_up_down_date!(chol::UpperTriangular, Kttful::AbstractMatrix, Q::AbstractMatrix, i::Int, InputDim::Int = 1)
    @views if i == 1
        # There is no l11
        # First do the downdate
        ## https://www.ucg.ac.me/skladiste/blog_10701/objava_23569/fajlovi/cholesky.pdf
        S33 = Cholesky(chol[InputDim+1:end, InputDim+1:end],'U',0)
        S23 = chol[1:InputDim, InputDim+1:end]

        for idim in 1:InputDim
            lowrankupdate!(S33, Array(S23[idim,:]))
        end
        # Now we need to do the update

        S22 = cholesky(Kttful[:,1:InputDim]+Q).U
        S23 = permutedims(S22,(2,1))\(Kttful[:,InputDim+1:end])
        for idim in 1:InputDim
            lowrankdowndate!(S33,Array(S23[idim,:]))
        end

        chol[:,:] =  UpperTriangular([S22 S23; permutedims(S23,(2,1)) S33.U])[:,:]
    else
        # First do the downdate
        S11 = chol[1:(i-1)*InputDim,1:(i-1)*InputDim]
        S13 = chol[1:(i-1)*InputDim,i*InputDim+1:end]
        S33 = Cholesky(chol[i*InputDim+1:end,i*InputDim+1:end],'U',0)
        S23 = chol[(i-1)*InputDim+1:i*InputDim,i*InputDim+1:end]

        for idim in 1:InputDim
            lowrankupdate!(S33,Array(S23[idim,:]))
        end
        # Now we need to do the update
        S11 = S11
        A12 = permutedims(Kttful[:,1:(i-1)*InputDim], (2,1))
        A22 = Kttful[:,(i-1)*InputDim+1:i*InputDim] +Q
        A23 = Kttful[:,i*InputDim+1:end]
        S13 = S13
        S12 = permutedims(S11,(2,1))\A12
        S22 = cholesky(A22 - permutedims(S12, (2,1))*S12).U
        S23 = permutedims(S22,(2,1))\(A23 - permutedims(S12, (2,1))*S13)

        for idim in 1:InputDim
            lowrankdowndate!(S33,Array(S23[idim,:]))
        end

        chol[:,:] = UpperTriangular([S11 S12 S13;
            permutedims(S12, (2,1)) S22 S23;
            permutedims(S13, (2,1)) permutedims(S23, (2,1)) S33.U])[:,:]
    end

end
willtebbutt commented 4 years ago

Fantastic -- this provides a great starting point.

donhausk commented 4 years ago

Its a bit hard coded for my use case to be fair. Don't hesitate to ask questions in case anything is unclear.

willtebbutt commented 4 years ago

We should be able to do this in the context of both exact and approximate inference!

In the case of exact inference, we would use this to implement

posterior(fx::FiniteGP{<:PosteriorGP}, y::AbstractVector{<:Real})

efficiently. When you condition a PosteriorGP on more observations, you should just get another PosteriorGP conditioned on all of the observations! The code above should be a good starting point for this bit.

I think the same makes sense for approximate conditioning, we would just need to implement

approx_posterior(::VFE, fx::FiniteGP{<:ApproxPosteriorGP}, y::AbstractVector{<:Real})

which should only be a little different. @donhausk did you ever think about this with pseudo points? I may be misunderstanding the code above, but it looks like it's tailored to exact inference.

donhausk commented 4 years ago

Yes you are right, this is a good point. For my code, i am using the variational sparse GP approximation by Titsias. In that case, you don't need the Cholesky update as any two outputs are independent give the inducing points. Unfortunately, i have not implemented the case for the FITC sparse approximation. For a reference, take a look at the end of the 4th chapter from Frigolas PHD thesis http://www.rogerfrigola.com/doc/thesis.pdf .

donhausk commented 4 years ago

Besides, for the variational sparse GP, the optimal sparse approximation is usually used. I am not sure if you need this but I could additionally send you an implementation of the ELBO in cases where you do not use the actual optimal sparse approximation but any multi variate normal distribution for the inducing points.

willtebbutt commented 4 years ago

@donhausk we've just released version 0.1.1 that contains sequential conditioning. If you've got a spare 10 minutes, it would be great if you could try it out and let us know how you find it. i.e. try typing ?posterior and see if it's clear what's going on. Would be keen to know if there's anything that could be clearer / improved.

willtebbutt commented 4 years ago

While I remember: the current interface doesn't make it easy / efficient to compute both the posterior and the log marginal likelihood at the same time. This is something that would be really desirable in practice (I think?), so we should make it easy. One option is to just return both the posterior process and log marginal likelihood from posterior.

Something analogous should happen for approximate_posterior.

sharanry commented 4 years ago

@willtebbutt Can we close this now that we have sequential conditioning for exact and approximate posterior?

willtebbutt commented 4 years ago

We can indeed. Great job on this @sharanry !