boost-R / mboost

Boosting algorithms for fitting generalized linear, additive and interaction models to potentially high-dimensional data. The current relase version can be found on CRAN (http://cran.r-project.org/package=mboost).
73 stars 27 forks source link

centered bols has trouble with newdata #70

Closed carlganz closed 7 years ago

carlganz commented 7 years ago
library(mboost)

# example from docs
data("bodyfat", package = "TH.data")

mod1 <- mboost(DEXfat ~ btree(age) + bols(waistcirc, center=TRUE) + bbs(hipcirc),
              data = bodyfat)

mod2 <- mboost(DEXfat ~ btree(age) + bols(waistcirc, center=FALSE) + bbs(hipcirc),
               data = bodyfat)

predict(mod1, bodyfat)
# errors
predict(mod2, bodyfat)
# no errors
predict(mod1)
# no errors

I'm guessing that the levels of waistcirc change when centered so the levels in newdata don't match the model even though the data is the same as the data used to build the model.

hofnerb commented 7 years ago

The problem is different from what you think, center = TRUE doesn't work for bols (anymore). However, bols can take multiple variables and computes the least squares (or penalized least squares) solution for these variables:

bols(x1, x2) is essentially equivalent to a base-learner of the form lm(u ~ x1 + x2), where u is the negative gradient. bols(x1, x2, intercept = FALSE) is essentially equivalent to a base-learner of the form lm(u ~ x1 + x2 - 1), where u is the negative gradient.

What you do is that you specify another variable by accident. In the first model it seems that it is treated as an intercept (as as.numeric(TRUE) is equal to 1), while in the second case you add a constant variable equal to zero which makes no sense at all.

As you are the second person running into this problem, we need to make sure that an error is thrown in that case.

hofnerb commented 7 years ago

@carlganz You stated that the example was taken from the docs. Does this include the (wrong) usage of center in bols or only the general model? I could not find any occurrence.

carlganz commented 7 years ago

It wasn't taken from docs. Sorry if I made it seem that way. Thanks for the clarification.