JuliaAI / MLJLinearModels.jl

Generalized Linear Regressions Models (penalized regressions, robust regressions, ...)
MIT License
80 stars 13 forks source link

cov-only training for L2Loss #145

Closed adienes closed 9 months ago

adienes commented 1 year ago

When X cannot fit in memory but X'X and X'y can be constructed in a distributed fashion, it would be incredibly useful to have solvers that can train on solely the covariances. Ridge can be done by solving a quadratic program over X'X + c*diagonal(n) but as far as I know there are no capabilities in Julia to do this with Lasso or ElasticNet besides passing in the problem to a generic optimizer.

As a bonus, I believe this is also pretty hard to do in Python. The best I could find was sklearn.linear_model.lars_path_gram which frequently gives me convergence issues, so there are bragging rights up for grabs :)

This is something I am willing to help work on if it is feasible given the current framework.

tlienart commented 1 year ago

if we ignore acceleration for now, the proximal gradient step for Lasso which is the core is like this:

The gradient is the usual X'(Xb - y) which in your case according to what you said should be computed as Cb - z where C=X'X and z = X'y have been pre-computed.

The prox step is just a shrinkage of the current candidate of size p, the dimensionality of your problem, which should be manageable given what you describe.

The steps to get this done in this package if that's something you'd like to try would be:

adienes commented 1 year ago

Thank you for the outline; this would be a pretty great feature for me so I will give it a shot.

One thing I don't fully understand: why make a new type for X'X ? Shouldn't it just be a Matrix or whatever the same type that X is? I guess this is just to control dispatch, but then doesn't that mean I'd have to manually forward all methods of Gramian (or whatever the type would) to Matrix ? It seems a little simpler to have some kind of flag in the function like gram=true or data=:gramian

tlienart commented 1 year ago

I didn't make the assumption that you could get the actual Z=X'X matrix, rather that you could compute the application of it ie Zx for x. This is all you need.

Anyway you could very well just do all I've said with the assumption that you have Z explicitly of course