MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

Turn ParallelAnalysisBase into dask custom collection #135

Open yuxuanzhuang opened 4 years ago

yuxuanzhuang commented 4 years ago

Aim

Turn ParallelAnalysisBase into a custom dask collection (https://docs.dask.org/en/latest/custom-collections.html).

Current syntax

u = mda.Universe(TPR, XTC)
ow = u.select_atoms("name OW")
D = pmda.density.DensityAnalysis(ow, delta=1.0)

# Option one (
D.run(n_blocks=2, n_jobs=2)

#  Option three
D.prepare_jobs(n_blocks=2)
D.compute(n_jobs=2)   #  or dask.compute(D)

#  furthermore
dask.compute(D_1, D_2, D_3, D_4...)  #  D_x as an individual analysis job.

Implementatation

Advantage

Benchmark

TODO

Illustration

TODO

yuxuanzhuang commented 4 years ago

A few test cases https://gist.github.com/yuxuanzhuang/73c80d5e0fe56930bc8a224973cb7903 The last missing image looks like this: image

orbeckst commented 4 years ago

This is pretty cool!

Is there a downside? EDIT: I mean: what are the disadvantages of this approach?

yuxuanzhuang commented 4 years ago

As far as as I can tell, I don't see limitations from this approach. (at least for the (block) split-apply-combine algorithm).

The speed don't seem to be stalled (or even faster? need further benchmarking (Before: 26.77 s, After: 25.1s).

From a developer perspective, it might be harder to maintain the code without the knowledge of dask (since custom collection is sort of an "advanced feature"). There might be bits and pieces need to be tuned/adjusted. And since it will be deeply intertwined with dask, it is hard to switch back to other tools.