MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

Unifying _single_frame() of AnalysisBase and ParallelAnalysisBase #131

Open yuxuanzhuang opened 4 years ago

yuxuanzhuang commented 4 years ago

Note this only works after https://github.com/MDAnalysis/mdanalysis/pull/2723 is merged to MDAnalysis

The key idea is not to break user's current codes and apply multiprocessing to AnalysisBase-based methods in MDAnalysis. i.e. a deliverable like: AnalysisMethod.run(n_jobs=n_cpus).

As far as I am concerned, the key difference between _single_frame() of AnalysisBase and ParallelAnalysisBase is whether it modified the state, or it returns a value (as per discussion in #18). The workaround in the related PR is _single_frame modifies the state (e.g. self._results), while _dask_helper retrieves and returns the state value.

As detailing in the PR #128 and the following showcase: https://gist.github.com/yuxuanzhuang/937863d018621ad1fd9446c1f4a2bf3b

It is still quite preliminary, all suggestions are welcome.

yuxuanzhuang commented 4 years ago

@orbeckst @richardjgowers @IAlibay @fiona-naughton @kain88-de

kain88-de commented 4 years ago

This requires changes on the _single_frame function. It would break existing user code. Which makes this API change more complicated it will likely break downstream mdanalysis packages. Having it in pmda is fine.

yuxuanzhuang commented 4 years ago

yeah..I realize it depends on results have to be saved inside _results. I guess the analysis methods vary too much to reach consensus. What do you think of an API like:

analysis.parallel_run(n_jobs=cpu_count,
                                  result_attr='timeseries') # or maybe a list of attr?
...

so that users only need to fill in one extra thing for their code without touching their _single_frame. And the program becomes aware of what need to be transferred. Of course, it still only works for some simple workflow. For aggregations, we sure need to write an extra line to sum results up inside _conclude.

orbeckst commented 4 years ago

This requires changes on the _single_frame function. It would break existing user code. Which makes this API change more complicated it will likely break downstream mdanalysis packages.

Generally speaking, we could break _single_frame() in MDAnalysis 2.0.0 – that update will be a major break anyway. If we have to annoy our users then get it over in one break... I like the PMDA version much better than the MDA one and I would look into switching MDA over to the PMDA model as it seems more promising for the future. But that's just my opinion and perhaps it's not worthwhile breaking downstream packages and code for it.

orbeckst commented 3 years ago

We did not break _single_frame() in MDA 2.0.0.