JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models
Other
143 stars 31 forks source link

Better handling of constant variables #28

Closed cnliao closed 5 years ago

cnliao commented 5 years ago

When fitting a LassoPath for \alpha=1, if a column of X is all zero, or when it is constant and centralization is requested, Lasso.fit fails with a "coordinate descent failed to converge in $maxiter iterations at λ = $λ" error.

The culprit is a 0/0 case which is addressed in this PR.

However I am unsure of the following:

  1. When \alpha < 1, is there any practical use for the coef of a constant (but non-intercept) variable to be non-zero? If so then the fix should be reconsidered.
  2. Should we just fail gracefully instead of trying to proceed for this pathological input?

Any suggestions much appreciated.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 158


Totals Coverage Status
Change from base Build 156: 0.01%
Covered Lines: 820
Relevant Lines: 944

💛 - Coveralls
AsafManela commented 5 years ago

Thanks! Would you mind providing a minimum working example? Also, is this handled differently in GLM.jl or in glmnet in R?

cnliao commented 5 years ago

a minimum working example?

using Lasso
x = randn(20,2)
x[:, 2] .= 0; # x[:,2] .= 1 works
y = x * [1,1]
fit(LassoPath, x, y; intercept=false, standardize=false) # errors

Also, is this handled differently in GLM.jl or in glmnet in R?

I am not familiar to either of this packages to have a say.

AsafManela commented 5 years ago

I think your solution is the way to go.

using GLM
fit(LinearModel, x, y)

throws

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

which is a bit more informative than Lasso.jl's current cryptic message. I can't imagine a scenario where a regression model would assign a nonzero coefficient to a column of zeros. There is no variation and any coefficient would give the same objective.