Bioconductor / ShortRead

8 stars 6 forks source link

FastqSampler/FastqStreamer: Seeking clarification for OpenMP threads issues #7

Closed HenrikBengtsson closed 2 years ago

HenrikBengtsson commented 2 years ago

In ?FastqStreamer, there's:

https://github.com/Bioconductor/ShortRead/blob/cca7724c772e97d22cfa2713ceaa03c3c4a27108/man/Sampler-class.Rd#L166-L179

I'm trying to figure out when the error happens. What does "running on multiple threads" in "... create problems when a process is already running on multiple threads ..." exactly mean here? Is the term "thread" here used in the classical way, where a process can have multiple threads? Could it be that this happens in forked parallel worker, e.g. parallel::mclapply() and BiocParallel::MulticoreParam()? That is, is it that this OpenMP implementation, which in itself is multi-threaded, is not fork safe? But it works just fine if one parallelize in separate processes, e.g. a PSOCK cluster.

Was a reproducible example identified? If so, that could be a great example of illustrating issues with OpenMP and R parallelization. I'm looking for such examples.

mtmorgan commented 2 years ago

There are posts on the Bioc-Devel mailing list around the time the documentation (of internal functions) was added

https://stat.ethz.ch/pipermail/bioc-devel/2013-May/004355.html https://stat.ethz.ch/pipermail/bioc-devel/2014-November/006666.html

the code to set thread number existed before that and was used internally when parallelizing (the author being aware that over-subscription would occur otherwise). From reading the post it sounds like these were separate processes (PSOCK clusters) with an overall limit from libgomp (at the time...).

HenrikBengtsson commented 2 years ago

Thanks for clarifying and for the pointers. I've created PR #8 as a suggestion how to improve on the current phrasing. Particularly, I think the "when a process is already running on multiple threads" part is not correct, since we're talking multi-processing here, not multi-threading. (The "multiple threads" part was what confused me, because that's not supported by R)