Open wcwitt opened 1 year ago
First draft of the revised API. I am proposing collapsing BRR and ARD into a single user interface, controlled by the ard_threshold
parameter. Please weigh in if you don't like this or have any other feedback.
@casv2 @cortner @bernstei @jameskermode
"""
bayesian_linear_regression(A, Y; <keyword arguments>)
Perform Bayesian linear regression, possibly with automatic relevance determination.
# Arguments
- `A::Matrix{<:AbstractFloat}`: design matrix, with observations as rows
- `Y::Vector{<:AbstractFloat}`: target vector
- `ard_threshold::AbstractFloat=0.0`: if nonzero, prune the basis with automatic relevance determination. must be greater than `min_variance_coeff`.
- `tol::Float64=1e-3`: tolerance to use for terminating the evidence maximization.
- `max_iter::Int=1000`: iteration limit for evidence maximization.
- `opt_method::String="LBFGS"`: method for evidence maximization.
- `min_var_coeff::AbstractFloat=1e-8`: lower bound for the variance on the coefficient prior.
- `min_var_noise::AbstractFloat=1e-8`: lower bound for the variance on the model noise.
- `factorization::String="cholesky"`: if "cholesky" performs poorly, try "svd" or "qr".
- `ret_covar::Bool=false`: whether to supply the covariance matrix in the results.
- `comm_size::Int=0`: if nonzero, sample from the posterior and include a committee in the results.
- `verbose::Bool=true`
"""
I like this a lot. How do will I construct a solver object? In Julia rather than JSON style?
See this, with the caveat that the name will change, it sounds like to BLR
https://github.com/ACEsuit/ACE1pack.jl/pull/105/commits/88b6efd0f2bb3e80de5c41ad0d53dc6a7cedbe5f
Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?
Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?
We should be quite careful with this. When I was messing around with the python re-implementation I got quite different performance when transforming those parameters to 1/x. That was with very ill conditioned matrices, which may be a lot better with the purified ACE1x basis, but it's not just a matter of notation.
I agree with this point, we’ve made similar observations here. Precision seems better behaved generally.
From: bernstei @.> Sent: Tuesday, January 31, 2023 7:54:18 PM To: ACEsuit/ACEfit.jl @.> Cc: James Kermode @.>; Mention @.> Subject: Re: [ACEsuit/ACEfit.jl] Clarify naming of Bayesian solvers (Issue #41)
Looks good. Is there a reason you’ve parametrised in terms of variances rather than precisions as is more common?
We should be quite careful with this. When I was messing around with the python re-implementation I got quite different performance when transforming those parameters to 1/x. That was with very ill conditioned matrices, which may be a lot better with the purified ACE1x basis, but it's not just a matter of notation.
— Reply to this email directly, view it on GitHubhttps://github.com/ACEsuit/ACEfit.jl/issues/41#issuecomment-1410975572, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAIAKTAMP5U4NUQHNZ6LUJTWVFUWVANCNFSM6AAAAAAUAO2AVU. You are receiving this because you were mentioned.Message ID: @.***>
My personal hyperprior on the noise is:
I basically have no idea. Maybe something roughly equivalent to 1 meV/atom? I will definitely suspect overfitting if the LML maximization returns something less than say 0.01 meV/atom.
When I went to translate that prior into code, it was simplest to enforce a lower bound on the variances - that's the entire explanation. Of course, we could convert the same intuition into an upper bound on the precisions, and I'm definitely open to that if it's better behaved.
So I would argue that the interface should use variances - because I'd be surprised if any of us reason in terms of precisions - but for the actual optimization we should do whatever works best numerically.
TODO: need to handle the transition between overconstrained/underconstrained more gracefully - should do that as part of this refactoring.
Recording from discussion elsewhere: "They should all get [a maxiter parameter] please, and then they should fail with a nice user-friendly message, something along the lines of "even when the solver hasn't converged the quality of the solution may be good, please test this before changing solver parameters""
I've experimented with a few ways of implementing Bayesian ridge (for example), and the proliferation of functions has gotten a bit confusing. Now that more people are starting to use them, it's crucial to reorganize and document them.