ecpolley / SuperLearner

Current version of the SuperLearner R package
272 stars 72 forks source link

SL.gbm uses all cores #126

Closed alexandergerber closed 5 years ago

alexandergerber commented 5 years ago

When I do this

library(superlearner)
y <- mtcars$mpg
X <- mtcars[ ,!names(mtcars) == "mpg"]
SuperLearner(y, X, 
              SL.library= list(
                "SL.gbm"
              )
)

all my cores are used. Even if I try to specify the number of cores by writing a customized SL.gbm function

SL.gbm.new = function(...) {
  SL.gbm(...,  cv.folds = 0,  n.cores = 1)
}

it still opens new R threads.

However, calling gbm() directly works as expected. It seems to me that this problem is somehow caused by SuperLearner().

ecpolley commented 5 years ago

In your new SL.gbm.new function, the n.cores isn't passed to the gbm function. You could try:

SL.gbm.n.cores <- function (Y, X, newX, family, obsWeights, gbm.trees = 10000, 
    interaction.depth = 2, shrinkage = 0.001, n.cores = 1, ...) 
{
    require("gbm")
gbm.model <- as.formula(paste("Y~", paste(colnames(X), collapse = "+")))
if (family$family == "gaussian") {
    fit.gbm <- gbm::gbm(formula = gbm.model, data = X, distribution = "gaussian", 
        n.trees = gbm.trees, interaction.depth = interaction.depth, 
        shrinkage = shrinkage, cv.folds = 5, keep.data = TRUE, 
        weights = obsWeights, verbose = FALSE, n.cores = n.cores)
}
if (family$family == "binomial") {
    fit.gbm <- gbm::gbm(formula = gbm.model, data = X, distribution = "bernoulli", 
        n.trees = gbm.trees, interaction.depth = interaction.depth, 
        shrinkage = shrinkage, cv.folds = 5, keep.data = TRUE, 
        weights = obsWeights, verbose = FALSE, n.cores = n.cores)
}
best.iter <- gbm::gbm.perf(fit.gbm, method = "cv", plot.it = FALSE)
pred <- predict(fit.gbm, newdata = newX, best.iter, type = "response")
fit <- list(object = fit.gbm, n.trees = best.iter)
out <- list(pred = pred, fit = fit)
class(out$fit) <- c("SL.gbm")
return(out)
}