Bug: LassoRegressor from MLJLinearModels returns null vector but the one from ScikitLearn does not

Sh4pe commented 3 years ago

I stumbled upon a case where the LassoRegressor from MLJLinearModels apparantly fails to solve a problem that can be solved using the ScikitLearn version.

In order to reproduce the bug, use the code from my repo MLJLinearModels_bug, especially the test DataFrame here.

In the following code, I solve the same problem twice, but the MLJLinearModels version returns the null vector.

using DataFrames
using MLJ
using CSV
using GZip

load_df() = GZip.open("data.csv.gz") |> CSV.read

function test_lasso(df, the_model)
    y, X = unpack(df, ==(:y), !=(:y))

    the_machine = machine(the_model, X, y)
    train, test = partition(eachindex(y), 0.7, shuffle=false)

    fit!(the_machine, rows=train)

    ŷ = MLJ.predict(the_machine, rows=test)

    unique(ŷ)
end

@load LassoRegressor pkg="MLJLinearModels" name=LassoRegressorMLJ
@load LassoRegressor pkg="ScikitLearn" name=LassoRegressorSKLearn

let
    df = load_df()

    if test_lasso(df, LassoRegressorMLJ()) == [0.0]
        println("LassoRegressor from MLJLinearModels returns the null vector...")
    end

    if test_lasso(df, LassoRegressorSKLearn()) != [0.0]
        println("whereas the LassoRegressor from ScikitLearn returns a vector != 0")
    end
end

Also note that the solver spits out this warning:

[ Info: Training Machine{LassoRegressor} @638.
┌ Warning: No appropriate stepsize found via backtracking; interrupting.
└ @ MLJLinearModels ~/.julia/packages/MLJLinearModels/EIj86/src/fit/proxgrad.jl:50
┌ Warning: Proximal GD did not converge in 1000 iterations.
└ @ MLJLinearModels ~/.julia/packages/MLJLinearModels/EIj86/src/fit/proxgrad.jl:64

I'd expect the MLJLinearModels version to return a vector != 0 as well. This appears to be a bug.

Versions (see Manifest.toml) MLJ: version = "0.14.0" MLJLinearModels: version = "0.5.0" ScikitLearn: version = "0.6.2"

ablaom commented 3 years ago

@Sh4pe Thanks for reporting.

@tlienart Are you able to look into this?

ablaom commented 3 years ago

@Sh4pe I know it's a lot of work, but if you can reproduce the issue with a simpler set of self-contained code, that will improve the chances this gets sorted quickly. 😄

Sh4pe commented 3 years ago

@Sh4pe I know it's a lot of work, but if you can reproduce the issue with a simpler set of self-contained code, that will improve the chances this gets sorted quickly. 😄

Sure, I can try to boil down the DataFrame so that it does not contain 5k lines. Would that be self-contained enough?

I don't know precisely when I'll find the time, but I'll post here when managed to shrink the test example.

tlienart commented 3 years ago

Thanks for this, and you provided everything I need to look into this so no need to slim it down further.

Almost certainly what's happening here is that the regularisation strength (which you didn't set explicitly) is very different in both models. In sklearn the regularisation strength is some number (I think default is 1) divided by n whereas in MLJLinearModels it's just 1 and you have to scale it yourself. So if you have (say) 50_000 rows there's a huge difference in regularisation strength meaning that "our" lasso will just flatten everything to zero.

If that's correct, it's not a bug, it's just a difference in definition.

Sh4pe commented 3 years ago

Thanks for this, and you provided everything I need to look into this so no need to slim it down further.

Almost certainly what's happening here is that the regularisation strength (which you didn't set explicitly) is very different in both models. In sklearn the regularisation strength is some number (I think default is 1) divided by n whereas in MLJLinearModels it's just 1 and you have to scale it yourself. So if you have (say) 50_000 rows there's a huge difference in regularisation strength meaning that "our" lasso will just flatten everything to zero.

If that's correct, it's not a bug, it's just a difference in definition.

That sounds very plausible and I'm a bit ashamed that I did not think about this possibility myself... :/ I'll check if this would be indeed the fix as soon as I can.

tlienart commented 3 years ago

no no, I think this is very likely to happen to lots of people, I'm toying with the idea of adding exactly the same defaults as scikit-learn bc there's also a few people who are struggling with the warning messages of no-convergence (due to similar issue even though in practice they don't matter); I just think it's less elegant..

In any case the docs for that Hyperparameter should be improved... I'm glad you brought this up! please don't hesitate to open other comments or issues on MLJLinearModels, I really need to get back to it 🙈

Sh4pe commented 3 years ago

So if I understood you correctly, @tlienart, I could divide the hyperparameter lambda in the LassoRegressor from MLJLinearModels by the size of the result vector and it should work then.

I've just tested this and I get the same error, so it appears that this is not the bug.

tlienart commented 3 years ago

Kindly let's call it a bug when we're convinced that it is one.

Scikitlearn uses this formula: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html?highlight=lasso#sklearn.linear_model.Lasso

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

MLJLM uses this formula

||y - Xw||^2_2 + alpha * ||w||_1

since the optimization is the same up to a multiplication constant, scikitlearn amounts to using 2n * alpha. So what would be relevant here is to compare a scikit learn model with alpha = 1.0 and MLJLM with lambda = 1.0 / (2n).

Beyond that it would be useful to compare the objective function you get either way.

I'll try to do this myself over the weekend

ablaom commented 2 years ago

@tlienart Reviewing old issues and wondered if this one could be closed?

tlienart commented 2 years ago

I think with the great additions from @jbrea to the Lasso code around normalisation, this should be closed now yes

JuliaAI / MLJLinearModels.jl

Bug: LassoRegressor from MLJLinearModels returns null vector but the one from ScikitLearn does not #81