bnicenboim / eeguana

A package for manipulating EEG data in R.
https://bnicenboim.github.io/eeguana/
Other
21 stars 9 forks source link

dplyr-like functions inside doParallel calls #136

Closed themeo closed 4 years ago

themeo commented 4 years ago

There is a strange issue when eeguana is combined with the doParallel package.

library(eeguana)
library(data.table)
library(doParallel)

dat = readRDS("doParallel.RDS")
all_dp = dat$.signal[, unique(.sample)]

cl <- makeCluster(1, outfile="")
registerDoParallel(cl)

ret = foreach(dp = all_dp, .packages = c("eeguana", "dplyr")) %dopar% {
  dat %>% filter(.sample == dp)
}

This results in the following error message: Error in { : task 1 failed - "Object 'dp' not found. Perhaps you intended .id, .sample, MiPf, LLPf, RLPf or 10 more"

Even a local variable defined inside the %dopar% loop is invisible to filter().

I should mention that this is probably a border case, I checked that it only occurs in doParallel calls (%dopar%) but not in foreach calls (%do%) and I can work around this by converting the data to a data.table before foreach. On the other hand, the workaround consumes much more memory (a long table with all heavy stuff from the .segments table attached which gets multiplied by all parallel processes in the cluster) so perhaps a general solution to this issue is possible.

Dataset used in the example: https://web.tresorit.com/l/0fRc2#VnFAADBSKEbK0LwDWO2roA

bnicenboim commented 4 years ago

it has to do with some of the dark magic that I use for NSE. I'm on it, it's just a couple of critical lines of code

bnicenboim commented 4 years ago

could you try devtools::install_github("bnicenboim/eeguana", ref ="experimental")?

I think I fixed this, and also made the filter to work faster as a side effect

themeo commented 4 years ago

Works like a charm, many thanks!