Closed orbeckst closed 6 years ago
cc @kain88-de @VOD555 @dotsdl @richardjgowers comments welcome
See also https://github.com/MDAnalysis/WorkshopHackathon2018/issues/20
I overlooked https://github.com/MDAnalysis/pmda/blob/master/pmda/parallel.py#L311 – we still default to 'multiprocessing'.
I am closing this as not super-urgent. More docs on how to set other schedulers are good, but can wait.
Well, the docs need some updating... so I re-open so that I can close an issue.
Expected behaviour
PMDA should run with "best performance" out of the box.
At a minimum, the docs should be clear what one has to do.
The dask docs recommend distributed for GIL-bound code so in the following I use it as the default, but we should benchmark the single machine schedulers
distributed
For using distributed:
import pmda ...
Actual behaviour
With PMDA now using dask's preferred way to select a scheduler (#66), we now default to Dask's default scheduler. For delayed(), this is the threads scheduler, which does not work well with our Python based code: the GIL serializes the tasks and I expect that performance is poor out of the box.