HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

Add option to fully ignore `.export` argument to `foreach()` #14

Closed HenrikBengtsson closed 7 years ago

HenrikBengtsson commented 7 years ago

Issue

Some code / packages that use foreach(..., .export) don't export all required globals, but since they're often only tested with forked parallel backends (e.g. doMC or doParallel on *nix) such bugs are often missed by the developer/tests, if at all tested.

Suggestion

The current behavior of doFuture is to fully respect any .export argument and not look for globals using the future framework. If .export is not specified, then globals are looked for automagically by the future framework. By adding a third alternative, controlled by getOption("doFuture.globals.export", "asis"), a user could ignore a faulty argument .export and have future find globals using automatic.

HenrikBengtsson commented 7 years ago

Added, but now we have two options related to the .export argument of foreach():

These should be merged into one option, e.g. doFuture.globals.method taking one of

HenrikBengtsson commented 7 years ago

Just to wrap up, here is an example where this new option is useful:

library("caret")
cl <- parallel::makeCluster(2L)
doParallel::registerDoParallel(cl)
example("confusionMatrix.train", package = "caret")
## > data(iris)
## > TrainData <- iris[,1:4]
## > TrainClasses <- iris[,5]
## > knnFit <- train(TrainData, TrainClasses,
## cnfsM.+                 method = "knn",
## cnfsM.+                 preProcess = c("center", "scale"),
## cnfsM.+                 tuneLength = 10,
## cnfsM.+                 trControl = trainControl(method = "cv"))
## Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : 
##   unable to find variable "optimismBoot"
[...]

and similarly with:

library("caret")
cl <- parallel::makeCluster(2L)
doFuture::registerDoFuture()
future::plan(future::cluster, workers = cl)
example("confusionMatrix.train", package = "caret")
## [...]
## > knnFit <- train(TrainData, TrainClasses,
## cnfsM.+                 method = "knn",
## cnfsM.+                 preProcess = c("center", "scale"),
## cnfsM.+                 tuneLength = 10,
## cnfsM.+                 trControl = trainControl(method = "cv"))
## Error in { : task 1 failed - "object 'ctrl' not found"

However, with this new option, it all works:

library("caret")
cl <- parallel::makeCluster(2L)
doFuture::registerDoFuture()
future::plan(future::cluster, workers = cl)
options(doFuture.foreach.export = "automatic")
example("confusionMatrix.train", package = "caret")
## [...]
## > knnFit <- train(TrainData, TrainClasses,
## cnfsM.+                 method = "knn",
## cnfsM.+                 preProcess = c("center", "scale"),
## cnfsM.+                 tuneLength = 10,
## cnfsM.+                 trControl = trainControl(method = "cv"))
## > confusionMatrix(knnFit)
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##             Reference
## Prediction   setosa versicolor virginica
##   setosa       33.3        0.0       0.0
##   versicolor    0.0       32.0       2.0
##   virginica     0.0        1.3      31.3
##                             
##  Accuracy (average) : 0.9667
## [...]