famuvie / breedR

Statistical methods for forest genetic resources analysts
http://famuvie.github.io/breedR/
GNU General Public License v3.0
31 stars 24 forks source link

compatibility with parallel::mclapply() #92

Open famuvie opened 6 years ago

famuvie commented 6 years ago

Parallelization via forking does not work:

library(parallel)
library(breedR)
#> Loading required package: sp

## A trivial multivariate dataset
set.seed(1234)
ncol <- 9
nobs <- 1e3
testdat <- as.data.frame(replicate(ncol, rnorm(nobs)))
names(testdat) <- paste0("y", 1:ncol)

## Fits the multivariate model and return the point estimates
## for the intercepts
fitm <- function(k) {
  res <- remlf90(
    as.matrix(testdat + k) ~ 1,
    dat = testdat + k
  )
  return(as.numeric(fixef(res)$Intercept))
}

ncores <- 4
rl <- mclapply(1:ncores, fitm, mc.cores = ncores)
# forrtl: No such file or directory
# forrtl: severe (28): CLOSE error, unit 49, file "Unknown"
# Image              PC                Routine            Line        Source             
# airemlf90          00000000012CAF33  Unknown               Unknown  Unknown
# [...]
famuvie commented 6 years ago

I think the problem has to do with the issue of sharing the temporary directory when using forks (i.e.: single process). A refactoring of the temporary files would be needed, to ensure that temporary filenames are unique.

While it would be nice to have forks working as well, we can leverage the package doParallel as a workaround, as follows :


library(doParallel)  # requires foreach, iterators, parallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
library(breedR)
#> Loading required package: sp

## A trivial multivariate dataset
set.seed(1234)
ncol <- 9
nobs <- 1e3
testdat <- as.data.frame(replicate(ncol, rnorm(nobs)))
names(testdat) <- paste0("y", 1:ncol)

## Fits the multivariate model and return the point estimates
## for the intercepts
fitm <- function(k) {
  library(breedR)
  res <- remlf90(
    as.matrix(testdat + k) ~ 1,
    dat = testdat + k
  )
  return(as.numeric(fixef(res)$Intercept))
}

ncores <- 4
cl <- makeCluster(ncores)
registerDoParallel(cl)
rl <- foreach(x = seq.int(ncores)) %dopar% fitm(x)
stopCluster(cl)

## verify
library(tidyverse)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.3.4     ✔ dplyr   0.7.4
#> ✔ tidyr   0.7.2     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ purrr::accumulate() masks foreach::accumulate()
#> ✖ dplyr::filter()     masks stats::filter()
#> ✖ dplyr::lag()        masks stats::lag()
#> ✖ purrr::when()       masks foreach::when()
data.frame(
  idx = factor(rep(seq.int(ncores), each = ncol)),
  estimate = unlist(rl)
) %>% 
  ggplot(aes(idx, estimate)) +
  geom_violin()