Divergence from GLMnet when using a matrix with many variables

tiemvanderdeure commented 9 months ago

When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.

An example to illustrate this:

using Lasso, GLMNet, Statistics

# fits identical models in Lasso and GLMNet from mock data
# and returns the mean absolute difference of the betas of both models
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
    data = rand(nrow, ncol)
    outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
    presence_matrix = [1 .- outcome outcome]

    l = Lasso.fit(LassoPath, data, outcome, Binomial())
    g = GLMNet.glmnet(data, presence_matrix, Binomial())

    lcoefs = Vector(l.coefs[:,end])
    gcoefs = g.betas[:, end]

    mean(abs, lcoefs .- gcoefs)
end

# 1000 records, 5 variables that all contribute to outcome
lasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9
# 1000 records, 100 variables of which 5 contribute to the outcome
lasso_glmnet_dif(1000, 1000, 5) # around 0.05

The context for this problem is that I'm working on a julia implementation of maxnet, where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.

AsafManela commented 9 months ago

The packages may be generating a different path of regularization lambdas. You can get them from the lasso path as l.λ, and from GLMNet as g.lambda.

Also, in your example it looks like you are picking the last coefs of the regularization path, but those are not necessarily the most interesting ones. Take a look at the docs here.

tiemvanderdeure commented 8 months ago

Hi, thanks for chipping in.

I don't think the regularization lambdas is where the differences are coming from. In the maxnet algorithm the lambdas are generated inside the algorithm and not by the packages. Even if I force the lambdas to be identical, I see the same kind of behaviour.

The same goes for when I look at some other part of the regularization path (maxnet always takes the last one, but I see your point).

E.g. in this example I take coefficients halfway in the path and force the lambdas to be identical, and lasso_glmnet_dif(1000, 1000, 5) is still around 0.02.

That seems like a big difference for it to come from floating point errors, which leads me to think the algorithms are somehow different?

function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
    data = rand(nrow, ncol)
    outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
    presence_matrix = [1 .- outcome outcome]

    l = Lasso.fit(LassoPath, data, outcome, Binomial())
    g = GLMNet.glmnet(data, presence_matrix, Binomial(); lambda = l.λ)

    lcoefs = Vector(l.coefs[:,floor(Int,end/2)])
    gcoefs = g.betas[:, floor(Int,end/2)]

    mean(abs, lcoefs .- gcoefs)
end

JuliaStats / Lasso.jl

Divergence from GLMnet when using a matrix with many variables #78