boost-R / mboost

Boosting algorithms for fitting generalized linear, additive and interaction models to potentially high-dimensional data. The current relase version can be found on CRAN (http://cran.r-project.org/package=mboost).
73 stars 27 forks source link

1st Example from confint.mboost generates warnings related to cvrisk.mboost() #112

Open AdrianRichter opened 3 years ago

AdrianRichter commented 3 years ago

When running the example:

## a simple linear example
set.seed(1907)
data <- data.frame(x1 = rnorm(100), x2 = rnorm(100),
                   z = factor(sample(1:3, 100, replace = TRUE)))
data$y <- rnorm(100, mean = data$x1 - data$x2 - 1 * (data$z == 2) +
                 1 * (data$z == 3), sd = 0.1)
linmod <- glmboost(y ~ x1 + x2 + z, data = data,
                   control = boost_control(mstop = 200))

## compute confidence interval from 10 samples. Usually one should use
## at least 1000 samples.
CI <- confint(linmod, B = 10, level = 0.9)
CI

The same no. of warnings is generated as bootstrap samples are specified. The warning says:

In cvrisk.mboost(mod, folds = cv(model.weights(mod), B = B.mstop), : zero weights

However, model.weights() of "linmod" is a vector of 1.

hofnerb commented 3 years ago

The issue is based on the way the CIs are computed: A nested bootstrap method is used as seen in the manual under Details

Use a nested boostrap approach to compute pointwise confidence intervals for the predicted partial functions or regression parameters. The approach is further described in Hofner et al. (2016).

This means that a bootstrap sample is drwan and on this sample another bootstrap is used to determine the optimal mstop value for that subsample. In this inner bootstrap, there are indeed zero weights and hence the warning is correct.

I agree, though, that it is not optimal. Either one should change the way the bootstrap is drawn or one should suppress the warning. Unfortunately I don't have time to dig deeper here at the moment and to find the best solution.