gbm-developers / gbm3

Gradient boosted models
134 stars 116 forks source link

undefined columns selected #144

Open harrysouthworth opened 7 years ago

harrysouthworth commented 7 years ago

No way I can share the data, so not necessarily reproducible.

Data is 20k x 4k, binomial response, 10-fold CV. Gets to end then reports "Error in [.data.frame(data, flag, model$variables$var_names, drop = FALSE) : undefined columns selected"

The columns have had make.names run on them, so it's not weird colnames.

Also, it's sucked up all my RAM and isn't letting go... killing the RStudio session did cause it to let go.

gbm 2.2, R 3.3.1, Ubuntu 16.04.2

harrysouthworth commented 7 years ago
my_data  <- data[flag, model$variables$var_names, drop=FALSE]

That's in predict.GBMCVFit in gbm-cv-predict.r.

Somewhere, some backquoted variable names have snuck in. The following line is a filthy fix: model$variables$var_names <- gsub("`", "", model$variables$var_names)

A filthier fix is for the user to use make.names on the data before calling gbm.