ecpolley / SuperLearner

Current version of the SuperLearner R package
272 stars 72 forks source link

xgboost with multicore (but not snow) parallelization hangs #148

Open rdiaz02 opened 9 months ago

rdiaz02 commented 9 months ago

I was trying to reproduce example 15 in "Guide to SuperLearner" (https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html#xgboost-hyperparameter-exploration) but it hangs. It works fine with:

I wonder if I am doing something wrong. Searching around, maybe xgboost (in particular the xgb.DMatrix operations) does not work well with fork clusters? (e.g., https://stackoverflow.com/questions/52080209/xgb-dmatrix-hangs-in-mclapply , https://github.com/ck37/varimpact/issues/20).

Reproducible example, tested in two different machines using Linux:

library(SuperLearner)
library(ranger)
library(xgboost)
library(parallel)

data(Boston, package = "MASS")

y <- as.numeric(Boston$medv > 22)
x <- subset(Boston, select = -medv)

options(mc.cores = 2)
getOption("mc.cores")

## Multicore works with ranger
set.seed(1, "L'Ecuyer-CMRG")

system.time({
    cv_sl = CV.SuperLearner(Y = y, X = x, family = binomial(),
                            cvControl = list(V = 10),
                            parallel = "multicore",
                            SL.library = c("SL.mean", "SL.ranger"))
})
summary(cv_sl)

## xgboost works sequentially

tune <- list(ntrees = c(10, 20),
             max_depth = 2,
             shrinkage = c(0.01))

learners <- create.Learner("SL.xgboost",
                           tune = tune,
                           detailed_names = TRUE,
                           name_prefix = "xgb")

cv_sl2 <- CV.SuperLearner(Y = y,
                          X = x,
                          family = binomial(),
                          cvControl = list(V = 3),
                          verbose = TRUE,
                          parallel = "seq",
                          SL.library = c(
                              learners$names
                            , "SL.ranger"
                          )
                          )
summary(cv_sl2)

## xgboost with parallel hangs
set.seed(1, "L'Ecuyer-CMRG")
cv_sl3 <- CV.SuperLearner(Y = y,
                          X = x,
                          family = binomial(),
                          cvControl = list(V = 3),
                          verbose = TRUE,
                          parallel = "multicore",
                          SL.library = c(
                              learners$names
                            , "SL.ranger"
                          )
                          )
summary(cv_sl3)

## Snow cluster. This works

cluster <- parallel::makeCluster(2)
cluster
## Do separately, to make sure each OK
parallel::clusterEvalQ(cluster, library(SuperLearner))
parallel::clusterEvalQ(cluster, library(ranger))
parallel::clusterEvalQ(cluster, library(xgboost))

parallel::clusterExport(cluster, learners$names)

parallel::clusterSetRNGStream(cluster, 1)

cv_sl4 <- CV.SuperLearner(Y = y,
                          X = x,
                          family = binomial(),
                          cvControl = list(V = 3),
                          verbose = TRUE,
                          parallel = cluster,
                          SL.library = c(
                              learners$names
                            , "SL.ranger"
                          )
                          )
summary(cv_sl4)