JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Other
143 stars 31 forks source link

Added ability to specify penalty factors specific to each coefficient as in glmnet #6

Closed AsafManela closed 7 years ago

AsafManela commented 8 years ago

Added ability to specify penalty factors specific to each coefficient as in glmnet with corresponding tests added against glmnet results.

Also fixed some bugs with :obj criterion, and added tests (:obj was never tested).

simonster commented 8 years ago

Thanks! Is there a reason to use a sparse matrix here to back the penalty factors, rather than just using an ordinary Vector? The issue with the sparse matrix is that accessing individual elements is O(log(n)) whereas an ordinary Vector is O(1), and I don't think the memory savings would be worth that. (A design matrix needs at least one non-zero element per predictor, so the input data size already scales at least with the number of predictors.)

AsafManela commented 8 years ago

I'm trying to implement a lasso extension that requires an update to the penalty factors on every CD outer loop iteration, where most of the weights are 1 and only nonzero coefficients have different weights. For large p problems I think this SparseWeights type will be faster. One thing I could do to improve speed in dense problems is to use SparseWeigths only if they are provided as penalty_weights in fit(), and if a Vector is provided I'll go with it. What do you think?

On Wed, Jul 6, 2016 at 5:38 PM, Simon Kornblith notifications@github.com wrote:

Thanks! Is there a reason to use a sparse matrix here to back the penalty factors, rather than just using an ordinary Vector? The issue with the sparse matrix is that accessing individual elements is O(log(n)) whereas an ordinary Vector is O(1), and I don't think the memory savings would be worth that. (A design matrix needs at least one non-zero element per predictor, so the input data size already scales at least with the number of predictors.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simonster/Lasso.jl/pull/6#issuecomment-230928384, or mute the thread https://github.com/notifications/unsubscribe/ANLNOun68t_sRt1Je2VE2V9ZXz_-OvHtks5qTC5QgaJpZM4JGjH7 .

AsafManela commented 7 years ago

I added several features and fixed a bug or two in LassoPath. My current master allows for variable specific penalty factors as in glmnet, which as you suggested are now implemented with a dense vector.

I also added:

  1. Gadfly-based path plots
  2. AICc-based segment selection in coef().
  3. GammaLassoPath as a generalization of LassoPath that implements Taddy (2016).

All these changes are unit tested and pass for v0.3 and v0.4. Unfortunately, Gadfly currently breaks on nightly, so that one does not pass.

simonster commented 7 years ago

Hey, sorry I haven't gotten back to you in a long time. I'm busy trying to finish my PhD :smile:. This looks like great work, and I'm going to try to review in detail over the next week.

Re: the regularization path plots, I agree that they're useful, but I'd rather not make this package depend on Gadfly. I think it's probably best to split that functionality into a separate package, and then link to it from the README.

simonster commented 7 years ago

Sorry, that was a longer "week" than I imagined. While I've made a bunch of comments, this is great work. See my comment above re: plots, and also, no need to worry about Julia 0.3 support. (Julia 0.5 is on the horizon, and it's no longer possible to register packages that support Julia 0.3 anyway.) Let me know if you have time to work on this further, or if you'd rather that I make the tweaks necessary to satisfy my pedantry.

AsafManela commented 7 years ago

Thanks for reviewing my code and giving detailed comments. I learned a lot. I did most of what you suggested, including extracting out the plot functionality to a new LassoPlot.jl package. I won't be able to get to addressing the cross-validation refactoring you suggested, or the item to remove code duplication between LassoPath and GammaLassoPath (which make a lot of sense), at least until early Nov. So if you can take those that would be great.