STOR-i / GaussianProcesses.jl

A Julia package for Gaussian Processes
https://stor-i.github.io/GaussianProcesses.jl/latest/
Other
308 stars 53 forks source link

Remove or parameterize the positive definite retry mechanism #131

Closed Red-Portal closed 3 years ago

Red-Portal commented 4 years ago

Hi, I currently have a problem with the positive definite retry mechanism below from GP.jl

    for _ in 1:10 # 10 chances
        try
            # return m, cholesky(m)
            copyto!(chol_factors, m)
            chol = cholesky!(Symmetric(chol_factors, :U))
            return m, chol
        catch err
            if typeof(err)!=LinearAlgebra.PosDefException
                throw(err)
            end
            # that wasn't (numerically) positive definite,
            # so let's add some weight to the diagonal
            ϵ = 1e-6 * tr(m) / n
            @inbounds for i in 1:n
                m[i, i] += ϵ
            end
        end
    end

While having this might make the usage experience more stable in some use cases, it doesn't make sense with more sophisticated hyperparameter treatments. That is, it changes the loss/likelihood surface by internally altering the value of the noise term. As a result, the loss/likelihood surface becomes extremely discontinuous and ugly. I think we should simply throw when the cholesky fails. Then the external optimizer or sampler can interpret that as loss -> inf or likelihood -> -inf (which is the current behavior of mcmc.jl and optimize.jl) A compromise could be to parameterize this so that the user can set the number of retries.

maximerischard commented 4 years ago

Thank you for submitting this issue. I would tend to agree with you, and would lean towards removing the retry mechanism altogether, replacing it with an error message that is as helpful as possible.

Red-Portal commented 4 years ago

Hi @maximerischard ! You mean a hard error? I think throwing or simply setting .mll = -Inf is the best way to go since we can algorithmically handle the specific case within an outer algorithm (e.g. MCMC)

maximerischard commented 4 years ago

I agree. Would you have time to create a PR?

Red-Portal commented 4 years ago

@maximerischard Sure. I'll do so as soon as I have some time.

owensnick commented 3 years ago

Whilst I don't disagree with @Red-Portal's original comment, there are situations where this was pretty useful. Could it return as an opt-in feature?

maximerischard commented 3 years ago

Hi @owensnick. Could you please explain the kind of situation you are encountering where dynamic retries would be helpful? Perhaps in a new issue so we can track it separately?

owensnick commented 3 years ago

Hi @maximerischard. Actually, upon reviewing it appears that mostly this is something cosmetic. With extra data preprocessing, the optimisation is successful despite encountering pos def exceptions. I work in genomics, we use Gaussian processes to analyse gene expression time series. We often fit GPs to 10,000+ different time series (one of each gene measured), within these are often a number of pathological cases which can trip up the optimizer, its hard to identify these ahead of time.

There were cases in which the dynamic retries were able to save the optimisation, however, with extra filtering and some data transformation I think I've managed to avoid most of these, although I am yet to check this on the full data set. The optimizer does still encounter pos def exceptions.

So this actually becomes a problem of avoiding stdout filling with LinearAlgebra.PosDefException when I fit a large number of GPs. Looks like this could be solved by a opt-out condition in get_optim_target.