boost-R / gamboostLSS

Boosting models for fitting generalized additive models for location, shape and scale (GAMLSS) to potentially high dimensional data. The current relase version can be found on CRAN (https://cran.r-project.org/package=gamboostLSS).
26 stars 11 forks source link

Fix trace for non-cyclic models #23

Closed hofnerb closed 7 years ago

hofnerb commented 8 years ago
## cyclic fitting: OK
> model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
+ control = boost_control(mstop = 100, trace = TRUE), method = "cycling")
[   1] ..................................................................... -- risk: 4625.79 
[  72] ...........................
Final risk: 4613.177 

## non-cyclic fitting: Not OK
> model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
+ control = boost_control(mstop = 100, trace = TRUE), method = "inner")
[   1]  -- risk: 4746.613 
[  4] ..................................................................... -- risk: 4654.223 
[ 75] .......................
Final risk: 4640.354 
.

> model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
+ control = boost_control(mstop = 100, trace = TRUE), method = "outer")
[   1]  -- risk: 4746.613 
[  4] ..................................................................... -- risk: 4654.22 
[ 75] .......................
Final risk: 4640.355 
.

The first risk is wrongly printed, the blanks in [ 4]and [ 75]are wrong and the last dot in the newline is wrong.

hofnerb commented 8 years ago

The problem seems to be related to #22.

Somehow, we already start with too many steps. Actually, we want to print every single update step... Furthermore, in we subtract the initial steps from mstop even if we start from scratch. See

debug(gamboostLSS:::do_trace)
model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
                     control = boost_control(mstop = 100, trace = TRUE), method = "inner")

If we call do_trace the second time, i.e. from within iBoost_outer, we get with this call

do_trace(current = length(combined_risk[combined_risk != 0]),
                             mstart = ifelse(firstRun, length(fit) + 1, 
                                             length(combined_risk[combined_risk != 0])),
                             mstop = ifelse(firstRun, niter - length(fit), niter),
                             risk = combined_risk[combined_risk != 0])

the following arguments:

Browse[2]> current
[1] 3
Browse[2]> mstart
[1] 3
Browse[2]> mstop
[1] 96
Browse[2]> risk
      mu    sigma    sigma 
4752.856 4749.489 4746.613

This is wrong!

hofnerb commented 7 years ago

@ja-thomas Is this issue fixed in devel? Do we have checks for that issue? If not, we should include some in master after I've merged master and devel.

hofnerb commented 7 years ago

Seems ok!

set.seed(1907)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
x3 <- rnorm(1000)
x4 <- rnorm(1000)
x5 <- rnorm(1000)
x6 <- rnorm(1000)
mu    <- exp(1.5 +1 * x1 +0.5 * x2 -0.5 * x3 -1 * x4)
sigma <- exp(-0.4 * x3 -0.2 * x4 +0.2 * x5 +0.4 * x6)
y <- numeric(1000)
for( i in 1:1000)
    y[i] <- rnbinom(1, size = sigma[i], mu = mu[i])
dat <- data.frame(x1, x2, x3, x4, x5, x6, y)
model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
                     control = boost_control(mstop = 100, trace = TRUE), 
                     method = "noncyclic")
## [   1] .................................................................. -- risk: 3115.225 
## [  67] .................................
## Final risk: 3074.093 

model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
                     control = boost_control(mstop = 100, trace = TRUE), 
                     method = "cyclic")
## [   1] .................................................................. -- risk: 3044.36 
## [  67] .................................
## Final risk: 2988.163