ecpolley / SuperLearner

Current version of the SuperLearner R package
272 stars 72 forks source link

Prediction with glmnet and ksvm #127

Closed bm2609 closed 5 years ago

bm2609 commented 5 years ago

Hello,

I have the following problem: I created a SuperLearner with mean,glmnet,randomForest,xgboost and ksvm and I have problems with the predict.SuperLearner command, when the glmnet is used. Then I get the error Error in cbind2(1, newx) %*% nbeta : Cholmod-Fehler 'X and/or Y have wrong dimensions'. When the glmnet is not used by the SuperLearner (all coefficients are set to zero), the error does not appear and everything works well. I use the following prediction command: predict.SuperLearner(sl_fit, newdata = X_holdout, onlySL = TRUE). Does the glmnet need a different kind of prediction comman?

ecpolley commented 5 years ago

@bm2609 Does your data have factor variables where the levels are different between X and the new data? glmnetrequires the data.frame object to be converted to a matrix object, and the SL.glmnet wrapper used the model.matrix function to do this, but if you have factor variables with different levels between the X and newX data.frames, you will end up with different matrices, that is my guess for the error message you see.

bm2609 commented 5 years ago

@ecpolley all my data are numeric. I played around a bit and get different error messages, maybe this could help finding a solution. When I save the X_holdout data as data.frame, then I get this error message after using pred <- predict.SuperLearner(sl_fit, newdata = X_holdout, onlySL = TRUE):Error in .local(object, ...) : test vector does not match model !. However, when I am using the X_holdout data as matrix, then I get the original error message: Error in cbind2(1, newx) %*% nbeta : Cholmod-Fehler 'X and/or Y have wrong dimensions'.

ecpolley commented 5 years ago

@bm2609 Do you have a reproducible example you can share (post here or send me an email)? Using the example in the help documents works:

library(SuperLearner)
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)

## test set
m <- 1000
newX <- matrix(rnorm(m*p), nrow = m, ncol = p)
colnames(newX) <- paste("X", 1:p, sep="")
newX <- data.frame(newX)
newY <- newX[, 1] + sqrt(abs(newX[, 2] * newX[, 3])) + newX[, 2] -
  newX[, 3] + rnorm(m)

# generate Library and run Super Learner
SL.library <- c("SL.glmnet", "SL.gam", "SL.mean")
test <- SuperLearner(Y = Y, X = X, SL.library = SL.library, method = "method.NNLS")

pred <- predict(test, newdata = newX, onlySL = TRUE)
mean((pred$pred - newY)^2)
ecpolley commented 5 years ago

@bm2609 Also, you can check the dimension of the glmnet fit within the SuperLearner to see how it may have transformed the data. In the following line, replace fitSL with the name of the SuperLearner object fit.

coef(fitSL$fitLibrary$SL.glmnet_All$object$glmnet.fit, s = fitSL$fitLibrary$SL.glmnet_All$object$lambda.min)
ecpolley commented 5 years ago

The issue was newdata had a single observation and the predict.SL.ksvm function converted a matrix to a vector when the '[' function was applied. I've updated the code on Github to include drop=FALSE.