HenrikBengtsson / doFuture

:rocket: R package: doFuture - Use Foreach to Parallelize via Future Framework
https://doFuture.futureverse.org
84 stars 6 forks source link

Filling a matrix in parallel? #31

Closed ignacio82 closed 5 years ago

ignacio82 commented 5 years ago

Is it possible to use doFuture to fill up a matrix in parallel? For example, suppose I want to do the following in parallel:

N <- 100
K <- 50
A <- matrix(nrow = N, ncol = K)

# Sequential
for(n in 1:N){
  for(k in 1:K){
    A[n,k] <- n+k
  }
}

How can I parallelize this? I think my problem is that A is not shared in memory. I tried the following and it did not work:

# Parallel
library(doFuture)
registerDoFuture()
plan(multiprocess)
A <- matrix(nrow = N, ncol = K)

foreach(n = 1:N, .export = c("A")) %dopar% {
  for(k in 1:K){
    A[n,k] <- n+k
  }
}

This achieves what I want with other packages:

library(bigstatsr)
N <- 100
K <- 50
mat3 <- FBM(N, K)
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)
tmp3 <- foreach(j = 1:K, .combine = 'c') %:%
  foreach(i = 1:N, .combine = 'c') %dopar% {
    mat3[i, j] <- i + j
    NULL
  }
parallel::stopCluster(cl)
mat3[]

Thanks for the help!

HenrikBengtsson commented 5 years ago

Sorry for the later reply.

Is it possible to use doFuture to fill up a matrix in parallel ...?

No, and it's not specific to doFuture etc.

I think my problem is that A is not shared in memory.

Correct, in order to update the same object in memory, each parallel "worker" need direct access to the same memory location. Technically, this is not possible when in multi-process evaluation - it's possible in multi-threaded evaluation, but that is not available in R. Updating the same object concurrently opens up a can of worms (error prone, ...) and it breaks the functional nature of R. Instead, a truly functional programming approach avoids, by design, all of these issues. So, you do not want to do it that way. Instead, make sure to return each calculation:

y <- foreach(...) %dopar% { ... }

and then gather into a matrix afterward. See https://www.jottr.org/2019/01/11/parallelize-a-for-loop-by-rewriting-it-as-an-lapply-call/ - it provides an example of this.

Importantly, despite it's name, foreach() is not a replacement to for(...) { } code. That is a very common mistake. Instead, treat it just as you would treat lapply(). It is very unfortunate that foreach() %do% { ... } or foreach() %dopar% { ... } with registerDoSEQ() emulates a for loop - it's very misleading and that is not how it is intended to be used.

Now, the reason why you can use a bigstatsr::FBM() matrix is because each worker writes to common file. This is like having updating a database table from multiple locations concurrently.