JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Other
143 stars 31 forks source link

Type safe regularization path segment selection #29

Closed AsafManela closed 5 years ago

AsafManela commented 5 years ago

The current approach to selecting a regularization path segment is using the select::Symbol keyword argument as in coef(path; select=:AIC) or predict(path, newX; select=:AIC). Because one of the possibilities is to select coefficients from all segments with select=:all (the default), these methods are not type safe.

This PR deprecates these methods, and replaces them with similar methods that take a path and a SegSelect struct, which takes care of the logic of selecting a particular segment. It implements MinAIC, MinAICc, MinBIC, MinCVmse, and MinCV1se segment selectors, and makes it easy to create new ones by defining a new SegSelect struct and implementing its segselect() method.

This PR also provides a simpler interface for fitting a lasso model and selecting the segment all in one call with a fit(RegularizedModel, X, y, dist, link; <kwargs>) method.
It returns a LinearModel or GeneralizedLinearModel representing the selected segment of a regularization path.

For example,

fit(LassoModel, X, y; select=MinBIC()) # BIC minimizing LinearModel 
fit(LassoModel, X, y, Binomial(), Logit(); 
    select=MinCVmse(path, 5)) # 5-fold CV mse minimizing model

This approach has the advantage that the model can be described (with coef) and used for prediction (with predict), without rerunning the selector, which can be expensive for cross-validating selectors.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 160


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/segselect.jl 29 32 90.63%
src/deprecated.jl 0 21 0.0%
<!-- Total: 34 58 58.62% -->
Files with Coverage Reduction New Missed Lines %
src/Lasso.jl 5 62.91%
<!-- Total: 5 -->
Totals Coverage Status
Change from base Build 159: 1.5%
Covered Lines: 541
Relevant Lines: 612

💛 - Coveralls
coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 160


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/segselect.jl 29 32 90.63%
src/deprecated.jl 0 21 0.0%
<!-- Total: 34 58 58.62% -->
Files with Coverage Reduction New Missed Lines %
src/Lasso.jl 5 62.91%
<!-- Total: 5 -->
Totals Coverage Status
Change from base Build 159: 1.5%
Covered Lines: 541
Relevant Lines: 612

💛 - Coveralls