anujkhare / iregnet

7 stars 12 forks source link

problem with penalty.learning data set #46

Closed tdhock closed 7 years ago

tdhock commented 7 years ago

Hey @anujkhare I ran iregnet on the penalty.learning data set from https://github.com/tdhock/glm-optimality/blob/master/figure-FISTA-iregnet.R and I think I have discovered a bug. This PR adds that data set to the iregnet package, and adds a test case for this data set. I think there must be some bug in the lambda grid computation, since there are a lot of models returned with zero L1 arclength. Any ideas how to fix?

anujkhare commented 7 years ago

@tdhock Thanks for the tests! I am unsure of why this is happening, I will look into it.

anujkhare commented 7 years ago

Columns 9 and 10 - bases and sum respectively, of the dataset's X.mat are very large as compared to the other columns.

Using iregnet with standardize=T solves the problem.

I have set standardize=T as the default option now.

I will check why lambdas are calculated incorrectly without standardization.

anujkhare commented 7 years ago

I guess it makes sense that the large values in those two columns will suppress the other coefficients.

The reason we get zeros is that the default thershold is set to 1e-4. setting threshold=1e-16 would give non-zero values for these two columns.

Similar thing happens in glmnet:

x <- matrix(rnorm(100), 20, 5)
y <- rnorm(20)
x[, 1] <- x[, 1] * 10000

library(glmnet)
fit1 <- glmnet(x, y)
coef(fit1)
fit2 <- glmnet(x, y, standardize = F)
coef(fit2)

@tdhock Let me know if this resolves the issue.

tdhock commented 7 years ago

ok great! I should have thought of trying that. Thanks!

anujkhare commented 7 years ago

Thanks, closing this PR.