DS-XL / ds-intermediate-2023

Class repo for DS intermediate level class 2023 cohort
4 stars 13 forks source link

How to decide what range of alphas to try in ridge regression #5

Open muspain opened 1 year ago

muspain commented 1 year ago

How to decide what range of alphas to try in ridge regression lasso_cv = LassoCV(normalize=True, alphas=np.logspace(-10, 1, 400))

emma-oc commented 1 year ago

Good question - I think there is no correct answer to this, but more a subjective choice based on how much regularization/shrinkage you'd like to put on the parameters. According to scikit-learn documentation, if you do not specify the range of alphas, it will automatically select the best alpha from the regularization path, which is defined by eps and n_alphas via CV, which sounds like a good default to me.

From a general learning perspective, I found this post link where there is some discussion on this topic. I also found this chapter link pretty helpful in terms of visualizing the regularization on parameters, though the code is in R...

From a practical perspective, you can usually define a logspace range, then do a model selection using goodness-of-fit metric (e.g. AIC, BIC, R^2, etc.), or use a more data-driven CV to choose the optimal parameter value.