Open CorySimon opened 7 years ago
Did you solve your problem? I do not know if this helps, as it has been quite sometime. But you can parallelize using ADMM. You refit Lasso on parts of the data iteratively.
Here's an implementation of ADMM to get you started in case you still need. https://github.com/baggepinnen/LPVSpectral.jl/blob/724561469a483aa1ffae6fa76b73c67ed2becce7/src/lasso.jl#L118
The functions above specify the prox operators that are inputs to ADMM to solve the LASSO problem
Used like this
using LPVSpectral, ProximalOperators
A = randn(70,100);
x = randn(100) .* (rand(100) .< 0.05);
y = A*x;
proxF = LeastSquares(A,y)
xh,zh = LPVSpectral.ADMM(randn(100), proxF, NormL1(3))
[x zh]
My entire design matrix cannot fit in memory. Much like
SGDRegressor.partial_fit()
in scikit-learn (see here), can I useLasso.jl
to fit in epochs, feeding batches of data at a time? I realize that this will likely not converge to the same parameters as if the data could all fit in memory.Maybe one way to train in batches would be to modify
criterion
infit()
to stop after a certain number of iterations?