JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Other
143 stars 31 forks source link

Lasso.jl for big-ish data #9

Open CorySimon opened 7 years ago

CorySimon commented 7 years ago

My entire design matrix cannot fit in memory. Much like SGDRegressor.partial_fit() in scikit-learn (see here), can I use Lasso.jl to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.

Maybe one way to train in batches would be to modify criterion in fit() to stop after a certain number of iterations?

rakeshvar commented 6 years ago

Did you solve your problem? I do not know if this helps, as it has been quite sometime. But you can parallelize using ADMM. You refit Lasso on parts of the data iteratively.

baggepinnen commented 4 years ago

Here's an implementation of ADMM to get you started in case you still need. https://github.com/baggepinnen/LPVSpectral.jl/blob/724561469a483aa1ffae6fa76b73c67ed2becce7/src/lasso.jl#L118

The functions above specify the prox operators that are inputs to ADMM to solve the LASSO problem

Used like this

using LPVSpectral, ProximalOperators
A = randn(70,100);
x = randn(100) .* (rand(100) .< 0.05);
y = A*x;
proxF = LeastSquares(A,y)
xh,zh = LPVSpectral.ADMM(randn(100), proxF, NormL1(3))
[x zh]