anujkhare / iregnet

7 stars 12 forks source link

large lambda values? #30

Closed tdhock closed 7 years ago

tdhock commented 8 years ago

Hey @anujkhare there seems to be some problem with the first lambda value (too large, 1e35) and the first few lambda values (too many lambda values which yield a weight vector of all zeros). Can you double-check the lambda grid computation? It would be best if the first lambda value gives an intercept-only model, and then the second lambda value gives at least one non-zero element in the weight vector.

Right now I see

data(ovarian)
X <- with(ovarian, cbind(age=age, residual.disease=resid.ds-1, treatment=rx-1))
y_l <- ovarian$futime
y_r <- ovarian$futime
y_r[ovarian$fustat == 0] <- NA
y_surv <- log(cbind(y_l, y_r))
fit <- iregnet(X, y_surv, family="gaussian")
rownames(fit$beta) <- c("(Intercept)", colnames(X))
t(with(fit, rbind(lambda=lambda, scale=scale, beta))[,1:14])
> t(with(fit, rbind(lambda=lambda, scale=scale, beta))[,1:14])
            lambda    scale (Intercept)          age residual.disease treatment
 [1,] 1.000000e+35 1.265787    6.772081  0.000000000                0         0
 [2,] 1.175100e+01 1.265758    6.772118  0.000000000                0         0
 [3,] 1.070728e+01 1.265775    6.772103  0.000000000                0         0
 [4,] 9.756176e+00 1.265768    6.772112  0.000000000                0         0
 [5,] 8.889411e+00 1.265772    6.772108  0.000000000                0         0
 [6,] 8.099720e+00 1.265770    6.772110  0.000000000                0         0
 [7,] 7.380153e+00 1.265771    6.772109  0.000000000                0         0
 [8,] 6.724524e+00 1.265771    6.772110  0.000000000                0         0
 [9,] 6.127134e+00 1.265771    6.772110  0.000000000                0         0
[10,] 5.582817e+00 1.265771    6.772110  0.000000000                0         0
[11,] 5.086855e+00 1.265771    6.772110  0.000000000                0         0
[12,] 4.634953e+00 1.265771    6.772110  0.000000000                0         0
[13,] 4.682283e+00 1.202117    7.036940 -0.005226961                0         0
[14,] 4.762457e+00 1.137780    7.319351 -0.010762496                0         0
> 
anujkhare commented 8 years ago

@tdhock The lambda grid that we are calculating is based on a formula for uncensored observations. I need to look at the math once again to figure out how to adapt it to work properly in case of censored observations.

As for the first lambda, it is set to BIG=1e35 to calculate an intercept only initial fit, which is then used to calculate the lambda grid. I will remove it from the output.

anujkhare commented 8 years ago

It turns out that the issue arises because I did not account for the intercept while calculating the lambda_max value. The formula I took from glmnet does not have it because they have already removed the intercept by mean normalization (and rescaling).

anujkhare commented 8 years ago

I did not remove the initial fit (lambda = 1e35) from the returned values. I found it useful for debugging.

@tdhock Would you like me to remove it? Maybe we could just modify print.iregnet to print solutions leaving the first one?

anujkhare commented 7 years ago

Fixed in #50