JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

Help with documentation review #136

Closed ablaom closed 1 year ago

ablaom commented 1 year ago

I'm considering having a stab at some of #135 but could do with some help.

  1. What does "EN" mean here?

Screen Shot 2023-01-20 at 3 50 17 PM

This appears in this doc page.

  1. The same doc page gives nice tables of model algorithms but the corresponding MLJ model types are not listed. Would be good to have this, to save some detective work (and the user certainly wants this anyway). To make it easier, I'm copying the lists below:
Regressors Formulation¹ Available solvers Comments Model type
OLS & Ridge L2Loss + 0/L2 Analytical² or CG³ ?
Lasso & Elastic-Net L2Loss + 0/L2 + L1 (F)ISTA⁴ ?
Robust 0/L2 RobustLoss⁵ + 0/L2 Newton, NewtonCG, LBFGS, IWLS-CG⁶ no scale⁷ ?
Robust L1/EN RobustLoss + 0/L2 + L1 (F)ISTA ?
Quantile⁸ + 0/L2 RobustLoss + 0/L2 LBFGS, IWLS-CG ?
Quantile L1/EN RobustLoss + 0/L2 + L1 (F)ISTA ?
Classifiers Formulation Available solvers Comments Model type
Logistic 0/L2 LogisticLoss + 0/L2 Newton, Newton-CG, LBFGS yᵢ∈{±1} ?
Logistic L1/EN LogisticLoss + 0/L2 + L1 (F)ISTA yᵢ∈{±1} ?
Multinomial 0/L2 MultinomialLoss + 0/L2 Newton-CG, LBFGS yᵢ∈{1,...,c} ?
Multinomial L1/EN MultinomialLoss + 0/L2 + L1 ISTA, FISTA yᵢ∈{1,...,c} ?
  1. Also, could we have a mapping from human name of solver (appearing in table) to Julia object to set as value in the model struct?

@tlienart @jbrea

tlienart commented 1 year ago
  1. EN is Elastic Net, sum of L1 and L2 penalty
  2. Assuming you want input output:
    1. all of the regressors are continuous -> continuous\
    2. logistic classifiers are continuous -> binary
    3. multinomial classifiers are continuous -> multiclass

all of them are deterministic, this repo is purely about finding what people would call MLE or MAP estimator

  1. I think what you want is to know how to set the solver field; a user could (though usually won't) indicate one of the relevant solver defined here: https://github.com/JuliaAI/MLJLinearModels.jl/blob/dev/src/fit/solvers.jl for the appropriate model. So for instance if the column says Analytical or CG then
solver = CG(...)
solver = Analytical(...)

would work for that model.

hope that helps, happy to review your stab at this

ablaom commented 1 year ago

@tlienart The current doc strings say something like " if solver=nothing then the default will be used" but don't say what that default is, for each model. Can I get this without digging into the code? Is it always the first one in this table with ISTA the default where it says "(F)ISTA"?

It's a bit annoying that the default isn't the default, instead of nothing if you know what I mean.

I also got confused for a while until I realised ISTA and FISTA were aliases for slow/fast ProxGrad. I was looking for ages for docstrings for ISTA and FISTA but they don't exist. Probably there are other dummies like me who didn't guess this straight away. I will try to address this in my documentation PR.

Ditto CG (alias for Analaytical(iterative=true)).

tlienart commented 1 year ago

defaults

Alternative solvers a user can specify

in general the user should not specify these alternatives as they will be inferior to the default (there will be edge cases where this is not true but I don't think these are very relevant for a ML practitioner).

solver parameters with their defaults

Analytical

Newton

Solves the problem with a full solve of the Hessian

NewtonCG

Solves the problem with a CG solve of the Hessian.

Same parameters as Newton except the naming: newtoncg_options

LBFGS

LBFGS solve; optim_options and lbfgs_options as per these docs

ProxGrad

A user should not call that constructor, the relevant flavours are ISTA (no accel) and FISTA (with accel); ProxGrad is not used for anything else than L1 penalized problems for now.

ISTA is ProxGrad for L1 with accel set to false; FISTA same story but with acceleration.

ISTA is not necessarily slower than FISTA but generally FISTA has a better chance of being faster. A non expert user should just use FISTA.

IWLSCG

Iteratively weighted least square with CG solve

In general users should not use this. A bit like Newton, NewtonCG above, IWLSCG will typically be more expensive, but it's an interesting tool for people who are interested in solvers and provides a sanity check for other methods.

It's a bit annoying that the default isn't the default, instead of nothing if you know what I mean.

If you have a suggestion for a cleanup, maybe open an issue? (I'm actually not sure I know what you mean)

ablaom commented 1 year ago

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver)

What does "possibly" mean? I'm guessing iteration=false for linear and iteration=true for ridge? Is that right?

ablaom commented 1 year ago

And I suppose we can add:

RobustLoss, with L1 + L2 Penalty (RobustRegressor, HuberRegressor) --> LBFGS

Yes?

ablaom commented 1 year ago

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver) SmoothLoss L2+L1 Penalty (lasso, elasticnet, logistic+multinomial with elastic net) --> FISTA

Looks like you are saying that the default solver for LogisticClassifier and MultinomialClassifier depends on the value of the regularisation parameters (which would explain the nothing solver default). Is the default only Analytical(...) if L1 penalty is zero, and FISTA otherwise? But now I'm confused because (F)ISTA aren't listed as possible solvers for those models in the current docs.

ablaom commented 1 year ago

I appreciate the help but I'm think I must be asking the wrong questions. Here's what I want to do for each model M:

Likely all this information is contained in want you are telling me, but I feel I have to "reverse engineer" the answer.

Does this better clarify my needs?

tlienart commented 1 year ago

L2Loss, L2Penalty (linear regression, ridge regression) --> default is Analytical (matrix solve, possibly using an iterative solver)

What does "possibly" mean? I'm guessing iteration=false for linear and iteration=true for ridge? Is that right?

no, both iteration=true/false can be used for either Linear or Ridge. In both cases you just have to solve a positive definite linear system of the form $Mx = b$ (just in Ridge it's perturbed by the identity to shift the spectrum away from zero); to solve such a system you can either do a full solve $M\b$ (using cholsolve) or you can use an iterative method such as conjugate gradient or krylov or whatever. The latter (iterative) can be good when the dimensionality of the problem is large. In general though, users should just use iterative=false, the full backsolve will work very well most of the time.

RobustLoss, with L1 + L2 Penalty (RobustRegressor, HuberRegressor) --> LBFGS

RobustLoss + L2 --> LBFGS RobustLoss + L2 + L1 --> FISTA

Looks like you are saying that the default solver for LogisticClassifier and MultinomialClassifier depends on the value of the regularisation parameters (which would explain the nothing solver default)

As soon as you have a non-smooth penalty such as L1, we cannot use smooth solvers and have to resort to proxgrad. So yes as soon as there's a non-zero coefficient in front of the L1 penalty, a FISTA solver is picked.

But now I'm confused because (F)ISTA aren't listed as possible solvers for those models in the current docs.

Screenshot 2023-01-27 at 08 47 07

state clearly in docs what values the field solver may take on, eg, "any instance of LBFGS, ProxGrad". state clearly what the default value is; if this is "dynamic", ie depends on values of other parameters, then I want a concise statement of the logic needed to determine what solver will be chosen.

isn't what I quoted in my previous answer under defaults what you wanted?

Maybe to simplify (I'm aware you have limited bandwidth and that it's not helping to have a long conversation), how about we do this just for Linear+Ridge in a draft PR, we get to a satisfactory point and then we progress from there?

MLJ constructors:

for both the solver can be specified to be Analytical(...). The default is Analytical(). Difference with default is if the user passes ;iterative=true in which case they may also specify max_inner

ablaom commented 1 year ago

@tlienart Thanks for the additional help and your patience. #138 is now ready for your review.