emmanuelparadis / ape

Analysis of Phylogenetics and Evolution
https://emmanuelparadis.github.io/
GNU General Public License v2.0
53 stars 11 forks source link

Parallelization in dist.topo #72

Closed V-Z closed 10 months ago

V-Z commented 1 year ago

I asked about possibility to add parallelization to dist topo using parallel and doParallel. From the discussion I copy here some comments from Klaus Schliep:

Defaulting to "cores = detectCores()" is not a good idea. The CRAN Repository Policy states: "If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously." In short such a default can cause serious problems on a cluster and I got properly ripleyed for this years ago, but that's another story. So the default should be something like min(2L, detectCores()).

and

From my testing it works well. I'm not sure if tryCatch is really needed there.

I am not sure if it is necessary, but you don't want to have it in the inner loop ;)

The proposal for discussion is in PR https://github.com/emmanuelparadis/ape/pull/71

emmanuelparadis commented 1 year ago

Hi, I modified dist.topo following your suggestions. The code is updated on GH. The new option is now named mc.cores (like in boot.phylo) with 1 as default (see Klaus's comment on r-sig-phylo). I also improved the code for the score distance which is twice faster with a single core/CPU. Interestingly, the score distance is more sensitive to the number of cores than the default distance. For the latter, the improvement is visible with many trees (200+) with 50+ tips. Best, E.