ecpolley / SuperLearner

Current version of the SuperLearner R package
272 stars 72 forks source link

superLearner issue with dgCMatrix #123

Closed caprone closed 5 years ago

caprone commented 5 years ago

HI SuperLearner seems has problem with dgCMatrix also if SL.learners like ranger or xgbbost accepts it

ecpolley commented 5 years ago

Can you elaborate on the problem? Is this where 'X' is class 'dgCMatrix' instead of a data.frame or matrix?

caprone commented 5 years ago

Yes! Superlearner seems doesn't accept sparse matrix...

athammad commented 5 years ago

any news on the matter? I am trying to do some basic sentiment analysis and my "X" is a 'dgCMatrix'. Even when I transform it to a data.frame using as.data.frame(as.matrix(mydata))) I still get the following errors

model <- SuperLearner(y,
                      x,
                      family=binomial(),
                      SL.library=list("SL.ranger"))
Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected
Error in `[.data.frame`(x, r, vars, drop = drop) : 
  undefined columns selected
In addition: Warning message:
In FUN(X[[i]], ...) : Error in algorithm SL.ranger 
  The Algorithm will be removed from the Super Learner (i.e. given weight 0) 
ck37 commented 5 years ago

Remove list().

On Mon, Apr 22, 2019 at 9:21 PM Ahmed T. Hmmad notifications@github.com wrote:

any news on the matter? I am trying to do some basic sentiment analysis and my "X" is a 'dgCMatrix'. Even when I transform it to a data.frame using as.data.frame(as.matrix(mydata))) I still get the following errors

model <- SuperLearner(y, x, family=binomial(), SL.library=list("SL.ranger"))

Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected In addition: Warning message: In FUN(X[[i]], ...) : Error in algorithm SL.ranger The Algorithm will be removed from the Super Learner (i.e. given weight 0)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ecpolley/SuperLearner/issues/123#issuecomment-485638069, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMUUEEZZHSSDNRS3NVTLPR2FFFANCNFSM4GMBZJ5Q .

athammad commented 5 years ago

Hi @ ck37,

still getting the same error. I don't think is related with the list() function.

ck37 commented 5 years ago

What do class(x), dim(x), and table(sapply(x, class)) return?

On Mon, Apr 22, 2019 at 9:48 PM Ahmed T. Hmmad notifications@github.com wrote:

Hi @ ck37,

still getting the same error. I don't think is related with the list() function.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ecpolley/SuperLearner/issues/123#issuecomment-485642126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMUTETNYPWHQWMTLMM7LPR2IIRANCNFSM4GMBZJ5Q .

athammad commented 5 years ago
class(x)
[1] "data.frame"
> dim(x)
[1] 5070 6652
> table(sapply(x, class))

numeric 
   6652 
ck37 commented 5 years ago

Hmm, that all looks good. Are you able to create a reproducible example? https://www.tidyverse.org/help/

It works fine for me when I convert a dataframe to a sparse matrix and back --

library(SuperLearner)

data(Boston, package = "MASS")

y_gaus = Boston$medv
y_bin = as.numeric(Boston$medv > 23)

# Remove outcome from covariate dataframe.
x = Boston[, -14]

# Convert to a matrix.
x_mat = model.matrix(~ ., data = x)
# Remove intercept.
x_mat = x_mat[, -1]

class(x_mat)

x_mat = data.frame(x_mat)

(sl = SuperLearner(y_bin, x_mat, family = binomial(), SL.library = "SL.ranger"))

# Convert to a dgc matrix.
dgc = Matrix::Matrix(as.matrix(x_mat), sparse = TRUE)
class(dgc)

# Convert back to a df.
dgc_df = as.data.frame(as.matrix(dgc))

class(dgc_df)
table(sapply(dgc_df, class))
head(dgc_df)

(sl = SuperLearner(y_bin, dgc_df, family = binomial(), SL.library = "SL.ranger"))
ecpolley commented 5 years ago

Hi @athammad

My first guess is it is one of the variable names in your matrix, can you try: colnames(x) <- paste0("X_", colnames(x))

And then see if the super learner runs?

athammad commented 5 years ago

@ecpolley you are right!

ecpolley commented 5 years ago

Thanks, we ran into an unknown error with another sentiment analysis where the column names being words and interpreted as an argument (e.g. "drop"), hence adding the "X" prevents the problem.