JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

box (or just positive) constraints on enet OLS #152

Open adienes opened 9 months ago

adienes commented 9 months ago

what would it take to support box / positive constraints on the Lasso / ElasticNet solvers? is this compatible with the existing API, and if so where could I get started on contributing to the implementation?

tlienart commented 9 months ago

so let's say you wanted to do a linear regression with positive coefficients, you could either write that as something where you project on R+ at every step (would be easy to implement with the current code but likely not guaranteed to converge and might be slow) or start in R+ and add a penalty that explodes (e.g. logarithm) when things get close to 0. Extension to box is the same.

Note also that there is an LBFGS algorithm that supports box constraints but I don't think optim.jl implements it, it might implement something similar to it though (e.g. this pr: https://github.com/JuliaNLSolvers/Optim.jl/pull/584, I haven't dug into that).

So maybe my suggestion would be to start with what optim has in store, see if it can be made to work, also see if there are existing "standard" implementations of regression with + constraints in python or other that could be used as a baseline.

Note: looks like fminbox implements a primal barrier that would amount to what I was suggesting in my first paragraph https://github.com/JuliaNLSolvers/Optim.jl/blob/5fa5d61a9f2ba9fc534b4e74b5659df45afd59c4/src/multivariate/solvers/constrained/fminbox.jl#L182-L206

adienes commented 9 months ago

is projecting to R+ at each step what sklearn does here ? https://github.com/scikit-learn/scikit-learn/blob/286f0c9d17019e52f532d63b5ace9f8e1beb5fe5/sklearn/linear_model/_cd_fast.pyx#L568C5-L568C33 it looks like it but I'm not entirely sure. I guess they are doing coordinate descent rather than gradient descent anyway

in particular I have this algorithm in mind for the constrained lasso https://arxiv.org/pdf/1611.01511.pdf

tlienart commented 9 months ago

is projecting to R+ at each step what sklearn does here ? https://github.com/scikit-learn/scikit-learn/blob/286f0c9d17019e52f532d63b5ace9f8e1beb5fe5/sklearn/linear_model/_cd_fast.pyx#L568C5-L568C33 it looks like it but I'm not entirely sure. I guess they are doing coordinate descent rather than gradient descent anyway

oof that's not an easy one to read, could be used as an example of why people should move to Julia... yeah it's coordinate descent, I don't fully understand what they're doing in there so would rather not comment too much. Projected gradient descent is more like:

  1. take a normal GD step (for which we have code), or in fact any admissible step btw, just that GD makes sense and has a better chance to lead where you want
  2. project orthogonally onto R+ (should be something like vector minus a dot product or something similar)
  3. do that again

tI think this will be ok in simple cases but might be pretty bad in some cases (lots of non-admissible steps that get projected with the next step that looks very similar) and is not guaranteed to converge afaik. Might still be good to implement as a comparison.

in particular I have this algorithm in mind for the constrained lasso https://arxiv.org/pdf/1611.01511.pdf

right so just as a note, ADMM tends (in my experience) to be a pretty poor algorithm, hard to set up, and even when set up right it can be beaten pretty handily by simpler methods (but people thought it was cool because it can be somewhat parallelisable and lots of papers followed up on it however I don't think it's as good as it's famous).

QP might be interesting but typically needs a dedicated QP solver, from a quick search there's a few in Julia but it'll come at a cost of an an additional dependency which would need to be justified (e.g. if the plan is to only use the dependency for CLasso then I'm not super highly in favour).

To me it would be quite interesting to just try fminbox on the algorithms that are here, see if that works well, try to compare with standard-ish implementations in python / R to see if we have something that's as good or better and then move on from there; it might be that it leads to something that is vastly superior to the article you quote (which wouldn't surprise me).

adienes commented 9 months ago

if the plan is to only use the dependency for CLasso then I'm not super highly in favour

fair enough --- although if eventually going that route, I will say I've had good experiences using Clarabel.jl

I'll start to play with fminbox a bit. with the gramian training ready, box constraints are definitely the next most important thing for me!