HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

doFuture conservatively choose not to treat variable as global when it is so conditionally #17

Closed HenrikBengtsson closed 6 years ago

HenrikBengtsson commented 6 years ago

Issue

library("doFuture")
registerDoFuture()
plan(multisession)
example("avNNet", package = "caret", run.dontrun = TRUE)
Loading required package: lattice
Loading required package: ggplot2

Attaching package: ‘caret’

The following object is masked from ‘package:future’:

    cluster

avNNet> data(BloodBrain)

avNNet> modelFit <- avNNet(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE)
Error in { : task 1 failed - "object 'ind' not found"

with

> traceback()
10: stop(simpleError(msg, call = expr))
9: e$fun(obj, substitute(ex), parent.frame(), e$data)
8: foreach(i = 1:repeats, .verbose = FALSE, .packages = "caret", 
       .errorhandling = "stop") %op% {
       if (any(names(theDots) == "trace")) {
           if (theDots$trace) 
               cat("\nFitting Repeat", i, "\n\n")
       }
       else cat("Fitting Repeat", i, "\n\n")
       set.seed(as.integer(seeds[i]))
       if (bag) 
           ind <- sample(1:nrow(x), replace = TRUE)
       thisMod <- if (is.null(classLev)) 
           nnet::nnet(x[ind, , drop = FALSE], y[ind], ...)
       else nnet::nnet(x[ind, , drop = FALSE], y[ind, ], ...)
       thisMod$lev <- classLev
       thisMod
   }
7: avNNet.default(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE)
6: avNNet(bbbDescr, logBBB, size = 5, linout = TRUE, trace = FALSE) at Rex55605f9776f9#11
5: eval(ei, envir)
4: eval(ei, envir)
3: withVisible(eval(ei, envir))
2: source(tf, local, echo = echo, prompt.echo = paste0(prompt.prefix, 
       getOption("prompt")), continue.echo = paste0(prompt.prefix, 
       getOption("continue")), verbose = verbose, max.deparse.length = Inf, 
       encoding = "UTF-8", skip.echo = skips, keep.source = TRUE)
1: example("avNNet", package = "caret", run.dontrun = TRUE)

Troubleshooting

The reason for doFuture missing ind as a global variable is because the foreach() call in caret is basically:

  ind <- seq(along = y)
  [...]
  foreach(i = 1:repeats, ...) %op% {
    [...]
    if (bag) 
       ind <- sample(1:nrow(x), replace = TRUE)
    [...]
  }

Note how ind is a global variable when bag == FALSE whereas a local variable when bag == TRUE. A type of ambiguity that is allowed in R.

Given that the foreach expression may be evaluated anywhere (=exported), I'd argue that the proper way to write this code would be to use:

  ind0 <- seq(along = y)
  [...]
  foreach(i = 1:repeats, ...) %op% {
    [...]
    if (bag) 
       ind <- sample(1:nrow(x), replace = TRUE)
    else
       ind <- ind0
    [...]
  }

However, that won't change the fact that above code may still exist and is expected to work also with doFuture.

Action

This needs to be solved by the globals package; I've created https://github.com/HenrikBengtsson/globals/issues/31 for this.

Now, why didn't the package tests of doFuture pick this up, especially since it runs through the examples of caret? Turns out that example("avNNet", package = "caret") is wrapped in \dontrun{}. I'll update the doFuture tests to also use run.dontrun = TRUE so this error will be detected.

/ht @hanase

HenrikBengtsson commented 6 years ago

UPDATE: I've now turned on run.dontrun = TRUE for all example():s and example("avNNet", package = "caret", run.dontrun = TRUE) is the only one that gives an error.

HenrikBengtsson commented 6 years ago

A minimal reproducible example of this issue that does not require the caret package:

reset <- FALSE
x <- 1
foreach(i = 1L) %dopar% {
  if (reset) x <- 0
  x + 1
}
## Error in { : task 1 failed - "object 'x' not found"

As a reference, same happens with:

reset <- FALSE
x <- 1
future_lapply(1L, function(i) {
  if (reset) x <- 0
  x + 1
})
HenrikBengtsson commented 6 years ago

This is now fixed in the develop branch of doFuture. Until these updates reach CRAN, it can be installed using:

remotes::install_github("HenrikBengtsson/doFuture@develop")

Thanks @hanase for bring this one to my attention.