boost-R / gamboostLSS

Boosting models for fitting generalized additive models for location, shape and scale (GAMLSS) to potentially high dimensional data. The current relase version can be found on CRAN (https://cran.r-project.org/package=gamboostLSS).
26 stars 11 forks source link

Bug in mstop for non-cyclical fitting #19

Closed hofnerb closed 8 years ago

hofnerb commented 8 years ago
require("gamboostLSS")

###negbin dist, linear###

set.seed(2611)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
x3 <- rnorm(1000)
x4 <- rnorm(1000)
x5 <- rnorm(1000)
x6 <- rnorm(1000)
mu    <- exp(1.5 + x1^2 +0.5 * x2 - 3 * sin(x3) -1 * x4)
sigma <- exp(-0.2 * x4 +0.2 * x5 +0.4 * x6)
y <- numeric(1000)
for (i in 1:1000)
  y[i] <- rnbinom(1, size = sigma[i], mu = mu[i])
dat <- data.frame(x1, x2, x3, x4, x5, x6, y)

model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
                     control = boost_control(mstop = 3), method = "inner")
cvr2 <- cvrisk(model, grid = 1:100)

mstop(cvr2)  #  a scalar
mstop(model) <- mstop(cvr2) #works
## but
mstop(model) # a vector
## and
mstop(model) <- c(mu = 10, sigma = 20) # breaks
mstop(model) ## now mstop = 10...

Does the current behavior makes sense? I don't think so.

I do see why it is interesting to know how many iterations were fitted for mu and sigma but at the same time, we cannot reuse this information. We only care about the total number of steps.

ja-thomas commented 8 years ago

I see your point but:

1) in most cases it is more interesting to see the distribution of the iterations that the overall iterations (which is specified by the user).

2) this would break a lot of my code since I use the mstop() function a lot in the internal fitting process.

What we could do is if mstop(model) <- c(mu = 10, sigma = 20) is called we use sum(c(mu = 10, sigma = 20)) as new mstop and give a warning to the user?

hofnerb commented 8 years ago

I see. Let's discuss this at the phone. That is easier.

hofnerb commented 8 years ago

return scalar with one attribute for individual steps per distribution parameter. perhaps add nice print function...