carolssnz / gradientboostedmodels

Automatically exported from code.google.com/p/gradientboostedmodels
0 stars 0 forks source link

predict.gbm error when cv.folds>1 #30

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run gbm with cv.folds>1 using a dataset with factor variables but restrict 
the model formula to a subset of the dataset variables leaving out the factor 
variables   
2.
3.

What is the expected output? What do you see instead?

gbm should run without a problem, instead giving "Error in 
object$var.levels[[i]] : subscript out of bounds"

What version of the product are you using? On what operating system?
gbm 2.1 
Windows 7 64bit

Please provide any additional information below.
Attached code shows the error using the sample code from the package 

It appears error is coming from predict.gbm from the step given below. Please 
also note that index i below for the factor variable in dataset x is different 
from object$var.levels[[i]] since object$var.levels is limited to model 
variables. In the attached code for instance length(object$var.levels)=4 
whereas cCols = 6

for (i in 1:cCols) {
    if (is.factor(x[, i])) {
        if (length(levels(x[, i])) > length(object$var.levels[[i]])) {
            new.compare <- levels(x[, i])[1:length(object$var.levels[[i]])]
        }
        else {
            new.compare <- levels(x[, i])
        }
        if (!identical(object$var.levels[[i]], new.compare)) {
            x[, i] <- factor(x[, i], union(object$var.levels[[i]], 
                levels(x[, i])))
        }
        x[, i] <- as.numeric(x[, i]) - 1
    }
} 

Original issue reported on code.google.com by meto...@gmail.com on 26 Jun 2013 at 6:18

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for posting, I am experiencing this error too

Original comment by jeff.ct....@gmail.com on 24 Sep 2013 at 3:04

GoogleCodeExporter commented 9 years ago
I'm getting the same error too. It turns out that "gbmCrossVal" passes the full 
"data" data.frame instead "x" to "gbmCrossValPredictions", which has all the 
predictors, not just those in the formula. Since "gbmCrossValModelBuild" uses  
"gbm.fit", the cross validation models do not have "Terms" slot, thus fail to 
rearrange predictors.

When there are unused factor variables, it gives the error. Otherwise, gbm runs 
without errors but gives wrong "cv.fitted" values (which will affect 
"print.gbm"). Attached file is a simple patch file.

Original comment by bkcho...@gmail.com on 13 Nov 2013 at 1:20

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for posting. Sorry to have gone quiet for a while. I'll try to upload a 
fixed version soon.

Original comment by harry.southworth on 14 Nov 2013 at 3:46

GoogleCodeExporter commented 9 years ago

Original comment by harry.southworth on 14 Nov 2013 at 3:47

GoogleCodeExporter commented 9 years ago
I'm experiencing the same problem. Something to do with R 3.0? Anyway, will 
give the patch a try, thanks!

Original comment by mrbenmo...@gmail.com on 15 Nov 2013 at 8:14

GoogleCodeExporter commented 9 years ago
Patched. Thank you all!

I've moved the project to Github
https://github.com/harrysouthworth/gbm

The patch should be in the latest release on there 2.1-0.3

Please let me know (at Github) if you have any further problems.

Thanks again,
Harry

Original comment by harry.southworth on 26 Nov 2013 at 2:45