Open mrocklin opened 4 years ago
Might also be worth looking at threadpoolctl
.
Edit: This is where scikit-learn devs refactored out all this threading logic.
I think that the Joblib folks did something clever
I went looking for what they did and found CPU over-subscription by joblib.Parallel due to BLAS which is an elaboration of https://github.com/joblib/joblib/issues/834, and which https://github.com/joblib/joblib/pull/940 claims to fix.
Oversubscription of threads is a common problem, especially when tasks call routines that are themselves multi-threaded. The most common culprit of this is BLAS/LAPACK calls.
A couple of years ago I think that the Joblib folks did something clever, they polled the process to see if it was oversubscribed, and if it was, it slowed down work. We already check CPU metrics regularly with
psutil
. TheWorker.ensure_computing
method could choose to hold off on submitting new tasks if it notices thatThis has come up several times, but most recently came up in a post here: https://coiled.io/blog/bomb-detection-with-dask-and-machine-learning/
https://github.com/dask/distributed/blob/67a9a5963b757835d185c7b202f6895069934f97/distributed/worker.py#L2465-L2499