MarioniLab / DropletUtils

Clone of the Bioconductor repository for the DropletUtils package.
https://bioconductor.org/packages/devel/bioc/html/DropletUtils.html
56 stars 27 forks source link

wrong args for environment subassignment #96

Closed hugo-cornu-one-biosciences closed 1 year ago

hugo-cornu-one-biosciences commented 1 year ago

Hi, I'm running emptyDrops on an AWS pipeline Sometime it works correctly but with big dataset I often get :

[1] "loading data..."
[1]   36601 6794880

[1] "running emptyDrops..."
--
Error in env[[as.character(i)]] <- value :   wrong args for environment subassignment
Calls: <Anonymous> ... bploop -> bploop.iterate -> <Anonymous> -> add_inorder
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :  1 parallel job did not deliver a result
#015  \|                                                                            #015  \|                                                                      \|   0%#015  \|                                                                            #015  \|                                                                      \|   1%
Execution halted

I am more a python guy than a R guy. Could you help me understand what is happening ?

emptyDrops is running in a docker container and installed like this :

RUN conda install --file /tmp/dropletutils_pkgs_version.txt
LTLA commented 1 year ago

You probably don't have enough memory. The breakdown is as follows:

The error message arises from BiocParallel, which is Bioconductor's parallelization framework. As R is itself single-threaded, BiocParallel achieves its parallelization by creating new processes. This works fairly well but exhibits some interesting interactions with R's garbage collector. The long and short of it - from what I remember, at least - is that each child process thinks it has access to the entire memory allocation for the job, such that the garbage collector never bothers to reclaim memory within each process. This causes the observed memory usage to increase until it hits the threshold for collection. On a regular computer, this isn't a problem as the GC is eventually triggered one way or another, but if the operating system is enforcing a lower memory limit (e.g., via cgroups), it will start killing child processes if the total memory usage of the job runs over that limit, without giving the chance for the GC to do its job. This is the cause of the cryptic error in bploop (one of BiocParallel's internals), which - understandably - doesn't know what to do when its children get sniped.

So the solution is to probably just increase your memory or decrease the number of threads in your BPPARAM=. It's hard to be sure, though, because you don't provide a lot of detail:

In general, showing the traceback() and sessionInfo() would have been helpful.