Closed adienes closed 9 months ago
if we ignore acceleration for now, the proximal gradient step for Lasso which is the core is like this:
The gradient is the usual X'(Xb - y)
which in your case according to what you said should be computed as Cb - z
where C=X'X
and z = X'y
have been pre-computed.
The prox step is just a shrinkage of the current candidate of size p
, the dimensionality of your problem, which should be manageable given what you describe.
The steps to get this done in this package if that's something you'd like to try would be:
fit
method here which takes whatever object you use to represent the map of Cx
, (e.g. https://github.com/JuliaLinearAlgebra/LinearMaps.jl); so instead of passing X
, y
, you'd pass C, z
. proxgrad.jl
as a template; you can leave everything the same except the computation of _f
and _fg!
(https://github.com/JuliaAI/MLJLinearModels.jl/blob/355edad72978b01586567c78131ce276b970632e/src/fit/proxgrad.jl#L21-L23) which would need to make use of C
and z
; to help you with that, look at smooth_fg!
for the L2Loss here you'd have to add one specialising over the type you use for C
. Thank you for the outline; this would be a pretty great feature for me so I will give it a shot.
One thing I don't fully understand: why make a new type for X'X
? Shouldn't it just be a Matrix
or whatever the same type that X
is? I guess this is just to control dispatch, but then doesn't that mean I'd have to manually forward all methods of Gramian
(or whatever the type would) to Matrix
? It seems a little simpler to have some kind of flag in the function like gram=true
or data=:gramian
I didn't make the assumption that you could get the actual Z=X'X matrix, rather that you could compute the application of it ie Zx for x. This is all you need.
Anyway you could very well just do all I've said with the assumption that you have Z explicitly of course
When
X
cannot fit in memory butX'X
andX'y
can be constructed in a distributed fashion, it would be incredibly useful to have solvers that can train on solely the covariances. Ridge can be done by solving a quadratic program overX'X + c*diagonal(n)
but as far as I know there are no capabilities in Julia to do this with Lasso or ElasticNet besides passing in the problem to a generic optimizer.As a bonus, I believe this is also pretty hard to do in Python. The best I could find was
sklearn.linear_model.lars_path_gram
which frequently gives me convergence issues, so there are bragging rights up for grabs :)This is something I am willing to help work on if it is feasible given the current framework.