Better handling of constant variables

JuliaStats / Lasso.jl

Lasso/Elastic Net linear and generalized linear models

Other

143 stars 31 forks source link

Better handling of constant variables #28

Closed cnliao closed 5 years ago

cnliao commented 5 years ago

When fitting a LassoPath for \alpha=1, if a column of X is all zero, or when it is constant and centralization is requested, Lasso.fit fails with a "coordinate descent failed to converge in $maxiter iterations at λ = $λ" error.

The culprit is a 0/0 case which is addressed in this PR.

However I am unsure of the following:

When \alpha < 1, is there any practical use for the coef of a constant (but non-intercept) variable to be non-zero? If so then the fix should be reconsidered.
Should we just fail gracefully instead of trying to proceed for this pathological input?

Any suggestions much appreciated.

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 158

2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.01%) to 86.864%

Totals
Change from base Build 156:	0.01%
Covered Lines:	820
Relevant Lines:	944

💛 - Coveralls

AsafManela commented 5 years ago

Thanks! Would you mind providing a minimum working example? Also, is this handled differently in GLM.jl or in glmnet in R?

cnliao commented 5 years ago

a minimum working example?
using Lasso
x = randn(20,2)
x[:, 2] .= 0; # x[:,2] .= 1 works
y = x * [1,1]
fit(LassoPath, x, y; intercept=false, standardize=false) # errors
Also, is this handled differently in GLM.jl or in glmnet in R?

I am not familiar to either of this packages to have a say.

AsafManela commented 5 years ago

I think your solution is the way to go.

using GLM
fit(LinearModel, x, y)

throws

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

which is a bit more informative than Lasso.jl's current cryptic message. I can't imagine a scenario where a regression model would assign a nonzero coefficient to a column of zeros. There is no variation and any coefficient would give the same objective.