Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

I meet a error when I use BiocParallel #254

Closed fadedmeliodas closed 1 year ago

fadedmeliodas commented 1 year ago

Error in manager$availability[[as.character(result$node)]] <- TRUE : wrong args for environment subassignment

it happens when the ncores are bigger,such as 20

mtmorgan commented 1 year ago

This usually occurs when one of the tasks quits unexpectedly

> bplapply(1:4, function(i) if (i == 2) q())
Error in reducer$value.cache[[as.character(idx)]] <- values :
  wrong args for environment subassignment
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result

either because you are asking for too much memory (nworkers * memory used per worker > memory available on the computer) or because of a bug in the FUN being used. It is impossible to tell without a 'reproducible example'.

How much memory does each FUN use?

fadedmeliodas commented 1 year ago

i don’t know the memory used per worker ,but i see the total memory uesd is smaller than the tatal memory that can be uesd.Besides,i tried 1 nworkers.It works well,so i have no idea that why the error comes up(it seems that the code has no bug).

mtmorgan commented 1 year ago

You could approximately measure use with

gc(reset = TRUE)
result <- FUN(<first element>)
gc()

and look at 'max used (Mb)'. Treat this as a lower bound. If you are using a laptop, then 20 cores is likely to use up all cores and all memory for moderately sized data -- usually workers should be at most the number of cores available, e.g., via parallel::detectCores().

fadedmeliodas commented 1 year ago

i run it on server,and i use 10 nworkers,it still happens.

HenrikBengtsson commented 1 year ago

Until Martin is back online, I recommend trying with nworkers = 2. That'll help narrow in on the problem.

mtmorgan commented 1 year ago

If you are using bplapply() directly, and if you are using MulticoreParam(), then maybe some insight can come from using mclapply() instead

> fun = function(i) { message(i); if (i == 2) q(); i }
> bplapply(1:2, fun)
1

Error in reducer$value.cache[[as.character(idx)]] <- values :
  wrong args for environment subassignment
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result

versus (in a new session) where messages from the second process before abort are echoed to the terminal

> mclapply(1:2, fun)
1
2
Error in serialize(data, node$con) : error writing to connection
[[1]]
[1] 1

[[2]]
NULL

Warning message:
In mclapply(1:2, function(i) { :
  scheduled core 2 did not deliver a result, all values of the job will be affected