JuliaStats / GLMNet.jl

Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
Other
96 stars 35 forks source link

Update glmnet source #46

Open jolars opened 4 years ago

jolars commented 4 years ago

The glmnet source in this repository is outdated, dating back to 2015. The glmnet fortran backbone has since been updated several times. Please consider updating to the latest version.

The source files can be found at https://github.com/cran/glmnet/tree/master/src

JackDunnNZ commented 3 years ago

I updated the binary builder repo to the latest source: https://github.com/JuliaPackaging/Yggdrasil/pull/2028

However, when I try using the new JLL version it doesn't seem to work, so help may be needed debugging that.

I looked at the diff between the source we are using and the latest copy from the glmnet repo, and the good news is that it seems like the changes are largely cosmetic, with the biggest change being the introduction of a progress meter integrated with R. I couldn't find any significant changes to the actual algorithm from a quick look through: https://gist.github.com/JackDunnNZ/b04d15fc48fb33db9cff248582c6bc46

devmotion commented 3 years ago

It seems a major difference is that glmnet 4.0 can fit any GLM family, see, e.g. https://statisticaloddsandends.wordpress.com/2020/05/14/glmnet-v4-0-generalizing-the-family-parameter/ and https://cran.r-project.org/web/packages/glmnet/vignettes/glmnetFamily.pdf.

JackDunnNZ commented 3 years ago

Sorry, my comment was in reference to changes in the underlying glmnet fortran code, which based on the diff above seems to be largely unchanged - it seems that all of the changes in the recent releases are in the R code instead, and could be ported into Julia without having to update the underlying libglmnet.

devmotion commented 3 years ago

From the blog post I got the impression that this generalization was only possible by generalizing the Fortran code as well:

Before v4.0, glmnet() could only optimize the penalized likelihood for special GLM families (e.g. ordinary least squares, logistic regression, Poisson regression). For each family, which we specified via a character string for the family parameter, we had custom FORTRAN code that ran the modified IRLS algorithm above. While this was computationally efficient, it did not allow us to fit any penalized GLM of our choosing.

From v4.0 onwards, we can do the above for any GLM family in practice. [...] Underneath the hood, instead of having custom FORTRAN code for each family, we have a FORTRAN subroutine that solves (2) efficiently.

JackDunnNZ commented 3 years ago

Well they probably know better than I do 😅