Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.
https://katsevich-lab.github.io/sceptre/
GNU General Public License v3.0
22 stars 7 forks source link

Please consider using parallelly::availableCores() instead of parallel::detectCores() #128

Open HenrikBengtsson opened 1 month ago

HenrikBengtsson commented 1 month ago

Hello, I came to you R package from an HPC user asking how to parallelize with it.

I see that n_processors = "auto" defaults to:

https://github.com/Katsevich-Lab/sceptre/blob/875f0d7c58f8636784dc8c0ee348233ffbf3d8e9/R/aux_functions.R#L85

Unfortunately, this doesn't play well on systems with multiple users. There's a great risk it will overuse the CPU resources, resulting in grumpy sysadms, and unhappy fellow users on the same machine.

At a minimum, would you mind replacing that line with the following backward compatible solution:

 if (identical(n_processors, "auto")) n_processors <- floor(parallelly::availableCores(logical = FALSE)/2) 

The advantage of that, is that it respects limits that the sysadm and job schedulers assigns to the user/process. You can read more about the problems with detectCores() in my 2022-12-05 blog post 'Please Avoid detectCores() in your R Packages' (https://www.jottr.org/2022/12/05/avoid-detectcores/). That post also explains how availableCores() makes your code play nicer.

There are actually more issues with your current floor(parallel::detectCores(logical = FALSE)/2). Note that parallel::detectCores() will return 1 on some systems (e.g. virtual machines and Posit Cloud). That means you end up with floor(1/2), which becomes 0. So, you need to do something like max(1, ...) on that. As the above blog post mentions, there are also system where detectCores() returns NA_integer_.

ekatsevi commented 1 month ago

Thanks for bringing these issues to our attention! We will look into them. In the meantime, we would recommend using the sceptre Nextflow pipeline for much more scalable application of sceptre on HPC, which avoids the issues you bring up.

ekatsevi commented 1 month ago

Hi @HenrikBengtsson, thanks again for bringing up these subtle issues. Could you please suggest a concrete line (or lines) of code to replace the line in question, which addresses all of the issues you bring up? Thank you very much!