Clarify naming of Bayesian solvers

wcwitt commented 1 year ago

I've experimented with a few ways of implementing Bayesian ridge (for example), and the proliferation of functions has gotten a bit confusing. Now that more people are starting to use them, it's crucial to reorganize and document them.

wcwitt commented 1 year ago

First draft of the revised API. I am proposing collapsing BRR and ARD into a single user interface, controlled by the ard_threshold parameter. Please weigh in if you don't like this or have any other feedback.

@casv2 @cortner @bernstei @jameskermode


"""
    bayesian_linear_regression(A, Y; <keyword arguments>)

Perform Bayesian linear regression, possibly with automatic relevance determination.

# Arguments

- `A::Matrix{<:AbstractFloat}`: design matrix, with observations as rows
- `Y::Vector{<:AbstractFloat}`: target vector
- `ard_threshold::AbstractFloat=0.0`: if nonzero, prune the basis with automatic relevance determination. must be greater than `min_variance_coeff`.
- `tol::Float64=1e-3`: tolerance to use for terminating the evidence maximization.
- `max_iter::Int=1000`: iteration limit for evidence maximization.
- `opt_method::String="LBFGS"`: method for evidence maximization.
- `min_var_coeff::AbstractFloat=1e-8`: lower bound for the variance on the coefficient prior.
- `min_var_noise::AbstractFloat=1e-8`: lower bound for the variance on the model noise.
- `factorization::String="cholesky"`: if "cholesky" performs poorly, try "svd" or "qr".
- `ret_covar::Bool=false`: whether to supply the covariance matrix in the results.
- `comm_size::Int=0`: if nonzero, sample from the posterior and include a committee in the results.
- `verbose::Bool=true`
"""

cortner commented 1 year ago

I like this a lot. How do will I construct a solver object? In Julia rather than JSON style?

wcwitt commented 1 year ago

See this, with the caveat that the name will change, it sounds like to BLR

https://github.com/ACEsuit/ACE1pack.jl/pull/105/commits/88b6efd0f2bb3e80de5c41ad0d53dc6a7cedbe5f

jameskermode commented 1 year ago

Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?

bernstei commented 1 year ago

Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?

We should be quite careful with this. When I was messing around with the python re-implementation I got quite different performance when transforming those parameters to 1/x. That was with very ill conditioned matrices, which may be a lot better with the purified ACE1x basis, but it's not just a matter of notation.

jameskermode commented 1 year ago

I agree with this point, we’ve made similar observations here. Precision seems better behaved generally.

From: bernstei @.> Sent: Tuesday, January 31, 2023 7:54:18 PM To: ACEsuit/ACEfit.jl @.> Cc: James Kermode @.>; Mention @.> Subject: Re: [ACEsuit/ACEfit.jl] Clarify naming of Bayesian solvers (Issue #41)

Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?

We should be quite careful with this. When I was messing around with the python re-implementation I got quite different performance when transforming those parameters to 1/x. That was with very ill conditioned matrices, which may be a lot better with the purified ACE1x basis, but it's not just a matter of notation.

— Reply to this email directly, view it on GitHubhttps://github.com/ACEsuit/ACEfit.jl/issues/41#issuecomment-1410975572, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAIAKTAMP5U4NUQHNZ6LUJTWVFUWVANCNFSM6AAAAAAUAO2AVU. You are receiving this because you were mentioned.Message ID: @.***>

wcwitt commented 1 year ago

My personal hyperprior on the noise is:

I basically have no idea. Maybe something roughly equivalent to 1 meV/atom? I will definitely suspect overfitting if the LML maximization returns something less than say 0.01 meV/atom.

When I went to translate that prior into code, it was simplest to enforce a lower bound on the variances - that's the entire explanation. Of course, we could convert the same intuition into an upper bound on the precisions, and I'm definitely open to that if it's better behaved.

So I would argue that the interface should use variances - because I'd be surprised if any of us reason in terms of precisions - but for the actual optimization we should do whatever works best numerically.

wcwitt commented 1 year ago

TODO: need to handle the transition between overconstrained/underconstrained more gracefully - should do that as part of this refactoring.

wcwitt commented 1 year ago

Recording from discussion elsewhere: "They should all get [a maxiter parameter] please, and then they should fail with a nice user-friendly message, something along the lines of "even when the solver hasn't converged the quality of the solution may be good, please test this before changing solver parameters""

ACEsuit / ACEfit.jl

Clarify naming of Bayesian solvers #41