updated docs with explicit scheduler setting example

orbeckst commented 6 years ago

Expected behaviour

PMDA should run with "best performance" out of the box.

At a minimum, the docs should be clear what one has to do.

The dask docs recommend distributed for GIL-bound code so in the following I use it as the default, but we should benchmark the single machine schedulers

thread
processes

distributed

For using distributed:


from dask.distributed import Client
client = Client()

import pmda ...

or [configure the scheduler](http://docs.dask.org/en/latest/scheduler-overview.html?highlight=config#configuring-the-schedulers)
```python
dask.config.set(scheduler='distributed')

Actual behaviour

With PMDA now using dask's preferred way to select a scheduler (#66), we now default to Dask's default scheduler. For delayed(), this is the threads scheduler, which does not work well with our Python based code: the GIL serializes the tasks and I expect that performance is poor out of the box.

orbeckst commented 6 years ago

cc @kain88-de @VOD555 @dotsdl @richardjgowers comments welcome

orbeckst commented 6 years ago

I overlooked https://github.com/MDAnalysis/pmda/blob/master/pmda/parallel.py#L311 – we still default to 'multiprocessing'.

I am closing this as not super-urgent. More docs on how to set other schedulers are good, but can wait.

orbeckst commented 6 years ago

Well, the docs need some updating... so I re-open so that I can close an issue.

MDAnalysis / pmda

updated docs with explicit scheduler setting example #78

Expected behaviour

Actual behaviour