Open MichaelChirico opened 4 years ago
Another friendly bump. All arguments have already given.
I don't think this is a good idea. As documented, detectCores() detects the number of physical CPUs/cores or logical CPUs. It is clear that this number is not suitable to be hardcoded as the default for the number of workers for parallel processing in general case, and the documentation of ?detectCores says it
" This is not suitable for use directly for the ‘mc.cores’ argument of ‘mclapply’ nor specifying the number of cores in ‘makeCluster’. First because it may return ‘NA’, second because it does not give the number of allowed cores, and third because on Sparc Solaris and some Windows boxes it is not reasonable to try to use all the logical CPUs at once. "
That package code still does it should not be a reason to change detectCores() to do something else, to start lying.
It always depends on the actual tasks, and also other tasks in the system (other instances of R, which may or may not originate from the same R process, which may be created by OpenMP, and other applications), what is the right number of worker threads. The optimum may change during the computation. Also, different criteria may apply (throughput or latency in the system). On some of my check servers, latency is not important, but throughput is, for which given the tasks it is actually best to run much more workers than the number of logical CPUs.
So I think a better way to find the optimum is adaptively deciding on increasing/decreasing the number of workers at runtime based on the current load of the system. Parallel make for instance has this feature, R's parallel package does not.
For these reasons I don't think it is helpful to have environment variables for hardcoded limits of number of threads at all. It might be useful to check if other language runtimes have any means for this at all (I am not aware they have, and I doubt for the reasons above). Indeed, OpenMP does, and it is not uncommon to overload a machine when multiple applications run concurrently use OpenMP for too many threads.
In large servers used by multiple users, these issues have to be solved at system level, anyway, as there is not only R that could run there.
The problem, however, exists even on single-user single-use machines that can be overloaded due to hard-coded numbers of workers that add up (such as using R parallel + OpenMP). The key is that all packages that run parallel tasks should export controls over the number of workers (or load goals if they supported that) all the way to the top, to the user. In addition, the default has to be a small fixed number. It is then up to the user to tune up this number according to their needs.
This is what parallel packages does and others should do as well.
Ok, thanks for the comments, Tomas. I'm pretty sure we're on the same page of what the problems are.
So, I take it as 'parallel::detectCores()' is really meant for low-level querying of the local system and nothing else. Would R Core be open to introducing a new function parallel::availableCores()
that may default to 'parallel::detectCores()' but will also respect specific R options / environment variables such that the end-user (and sysadms) can limit the number of cores used?
I have 'future::availableCores()' that does this, it respects several known R options/env vars and HPC scheduler env vars (e.g. 'SLURM_CPUS_PER_TASK'). If none of these are set explicitly, and 'R_FUTURE_AVAILABLECORES_FALLBACK' is set, that will be default; the latter allows sysadms to set R_FUTURE_AVAILABLECORES_FALLBACK=1
globally forcing a single worker unless HPC scheduler has alloted it more cores, or the user goes an extra mile to override it.
Apart from changing the semantics of detectCores(), using such a function for the default number of worker threads would unfortunately have all the other problems I mentioned.
At least it would give end-users and sysadms the option to control this misbehavior.
How can we otherwise prevent this problem from keep growing? The way I see it right now is that we (=R) gives all the kids fast race cars and let them all out to play on the Monza track with a "Have fun!". I argue that we (=R) are still responsible for the damage made and we have an option to limit the damage. Without such options, what is the second-best thing we can do? Improve documentation, sure, but it still won't fix the actual problem. The only thing I can see left to do, is to have CRAN check for this in some way. An obvious test is for CRAN to have 'detectCores()' return NA on some machines - that would catch a lot of cases, but of course it will be easy to work around it. Another would be check for mc.cores > 2
etc. in several places in the 'parallel' package. That would catch a few more of these.
Do you have proposals?
I think I have answered that in my original comment: it might be useful to look at how other language runtimes do it, and specifically into load-based limits (e.g. such as in parallel make).
Some CRAN checks for misuse of detectCores() might be useful. One should not be worried specifically for these tests that they could be circumvented, almost all package tests can in principle be in such an open system as R is.
Created attachment 2479 [details] Patch for src/library/parallel/R/detectCores.R
# Background
In multi-tenant compute environments, such as academic HPC cluster, it is unfortunately rather common to see software tools that default to run in parallel using all available cores on the machine. This is also true for a large number (*) of R packages that default to use
parallel::detectCores()
cores (despite its help page recommends against it). This behavior leads to overloading the CPU/CPUs on the machine - sometimes rendering inaccessible.(*) A grep on CRAN package source code reveals that 4.3% (650 out of 15206) of the packages rely on
parallel::detectCores()
for parallel processing. Manual inspection on a large random subset shows that almost all use it to set the number of workers or threads for parallel processing.Comment: On Linux, "cgroups" is one framework for limiting the number of cores a process has access to. Unfortunately, not all Linux systems are configured to support cgroups.
# Suggestion
Allow users and sysadms to override the value of
parallel::detectCores()
by setting environment variableR_DEFAULT_CORES
. For example,$ R_DEFAULT_CORES="" Rscript -e "parallel::detectCores()" [1] 8
$ R_DEFAULT_CORES=1 Rscript -e "parallel::detectCores()" [1] 1
$ R_DEFAULT_CORES=2 Rscript -e "parallel::detectCores()" [1] 2
$ R_DEFAULT_CORES=2,4 Rscript -e "parallel::detectCores()" [1] 4 $ R_DEFAULT_CORES=2,4 Rscript -e "parallel::detectCores(logical=FALSE)" [1] 2
# Patch
I've attached a small patch that achieves the above.
METADATA