HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
133 stars 4 forks source link

Settings to disable forked processing in R, e.g. parallel::mclapply() #94

Open HenrikBengtsson opened 5 years ago

HenrikBengtsson commented 5 years ago

Issue

Using forks for parallel processing in R is not always safe. The parallel::mclapply() function uses forked processes to parallelize. One example where it has been confirmed that forked processing causes problems is when running R via RStudio. It is recommended to use PSOCK clusters (parallel::makeCluster()) rather than forked processes when running R from RStudio (https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011).

AFAIK, it is not straightforward to disable forked processing in R.

One could set environment variable MC_CORES=1 which will set R option mc.cores=1 when the parallel package is loaded. Since mc.cores = getOption("mc.cores", 2L) is the default for parallel::mclapply(), this will cause mclapply() to fall back to lapply() avoiding forked processing. However, this does not work when the code specifies argument mc.cores, e.g. mclapply(..., mc.cores = detectCores()).

Suggestion

Introduce environment variable R_ENABLE_FORKS and corresponding R option enable.forks that both take logical scalars. By setting R_ENABLE_FORKS=false or equivalently enable.forks=FALSE, parallel::mclapply() will fall back to lapply().

For parallel::mcparallel(), we could produce an error if forks are disabled.

HenrikBengtsson commented 5 years ago

I posted this to R-devel thread 'SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()' on 2019-04-11.

HenrikBengtsson commented 4 years ago

Patch

Here is a working first draft:

Index: src/library/parallel/R/unix/forkCluster.R
===================================================================
--- src/library/parallel/R/unix/forkCluster.R   (revision 77648)
+++ src/library/parallel/R/unix/forkCluster.R   (working copy)
@@ -30,6 +30,7 @@

 newForkNode <- function(..., options = defaultClusterOptions, rank)
 {
+    allowFork(assert = TRUE)
     options <- addClusterOptions(options, list(...))
     outfile <- getClusterOption("outfile", options)
     port <- getClusterOption("port", options)
Index: src/library/parallel/R/unix/mclapply.R
===================================================================
--- src/library/parallel/R/unix/mclapply.R  (revision 77648)
+++ src/library/parallel/R/unix/mclapply.R  (working copy)
@@ -28,7 +28,7 @@
         stop("'mc.cores' must be >= 1")
     .check_ncores(cores)

-    if (isChild() && !isTRUE(mc.allow.recursive))
+    if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
         return(lapply(X = X, FUN = FUN, ...))

     ## Follow lapply
Index: src/library/parallel/R/unix/mcparallel.R
===================================================================
--- src/library/parallel/R/unix/mcparallel.R    (revision 77648)
+++ src/library/parallel/R/unix/mcparallel.R    (working copy)
@@ -20,6 +20,7 @@

 mcparallel <- function(expr, name, mc.set.seed = TRUE, silent = FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
 {
+    allowFork(assert = TRUE)
     f <- mcfork(detached)
     env <- parent.frame()
     if (isTRUE(mc.set.seed)) mc.advance.stream()
Index: src/library/parallel/R/unix/pvec.R
===================================================================
--- src/library/parallel/R/unix/pvec.R  (revision 77648)
+++ src/library/parallel/R/unix/pvec.R  (working copy)
@@ -25,7 +25,7 @@

     cores <- as.integer(mc.cores)
     if(cores < 1L) stop("'mc.cores' must be >= 1")
-    if(cores == 1L) return(FUN(v, ...))
+    if(cores == 1L || !allowFork()) return(FUN(v, ...))
     .check_ncores(cores)

     if(mc.set.seed) mc.reset.stream()

with a new file src/library/parallel/R/unix/allowFork.R:

allowFork <- function(assert = FALSE) {
    value <- Sys.getenv("R_FORK_ALLOWED")
    if (nzchar(value)) {
        value <- switch(value,
           "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
           "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
            stop(gettextf("invalid environment variable value: %s==%s",
           "R_FORK_ALLOWED", value)))
    value <- as.logical(value)
    } else {
        value <- TRUE
    }
    value <- getOption("fork.allowed", value)
    if (is.na(value)) {
        stop(gettextf("invalid option value: %s==%s", "fork.allowed", value))
    }
    if (assert && !value) {
      stop(gettextf("Forked processing is not allowed per option %s or environment variable %s", sQuote("fork.allowed"), sQuote("R_FORK_ALLOWED")))
    }
    value
}

Examples

> options(fork.allowed = TRUE)  ## default
> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
[1] 14099 14100
> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
[1] 14101 14102
> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
[1] 14103.1 14104.2
> f <- parallel::mcparallel(Sys.getpid())
> print(parallel::mccollect(f))
$`14105`
[1] 14105
> cl <- parallel::makeForkCluster(1L)
> print(cl)
socket cluster with 1 nodes on host ‘localhost’

and

> options(fork.allowed = FALSE)
> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
[1] 14058 14058
> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
[1] 14058 14058
> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
[1] 14058.1 14058.2
> f <- parallel::mcparallel(Sys.getpid())
Error in allowFork(assert = TRUE) : 
  Forked processing is not allowed per option ‘fork.allowed’ or environment variable ‘R_FORK_ALLOWED’
> cl <- parallel::makeForkCluster(1L)
Error in allowFork(assert = TRUE) : 
  Forked processing is not allowed per option ‘fork.allowed’ or environment variable ‘R_FORK_ALLOWED’
> 
HenrikBengtsson commented 4 years ago

UPDATE 2020-01-10: I revived the R-devel thread on this again, which already have received some responses.

lionel- commented 1 month ago

We are very interested in this for Positron because the ark kernel does not support forking (https://github.com/posit-dev/positron/issues/3817).

Having both an environment variable and a global option would be ideal, because this way it would be possible to allow or disallow R subprocesses to fork by using the envvar or global option respectively.