MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

updated docs with explicit scheduler setting example #78

Closed orbeckst closed 6 years ago

orbeckst commented 6 years ago

Expected behaviour

PMDA should run with "best performance" out of the box.

At a minimum, the docs should be clear what one has to do.

The dask docs recommend distributed for GIL-bound code so in the following I use it as the default, but we should benchmark the single machine schedulers

import pmda ...

or [configure the scheduler](http://docs.dask.org/en/latest/scheduler-overview.html?highlight=config#configuring-the-schedulers)
```python
dask.config.set(scheduler='distributed')

Actual behaviour

With PMDA now using dask's preferred way to select a scheduler (#66), we now default to Dask's default scheduler. For delayed(), this is the threads scheduler, which does not work well with our Python based code: the GIL serializes the tasks and I expect that performance is poor out of the box.

orbeckst commented 6 years ago

cc @kain88-de @VOD555 @dotsdl @richardjgowers comments welcome

See also https://github.com/MDAnalysis/WorkshopHackathon2018/issues/20

orbeckst commented 6 years ago

I overlooked https://github.com/MDAnalysis/pmda/blob/master/pmda/parallel.py#L311 – we still default to 'multiprocessing'.

I am closing this as not super-urgent. More docs on how to set other schedulers are good, but can wait.

orbeckst commented 6 years ago

Well, the docs need some updating... so I re-open so that I can close an issue.