MDAnalysis / pmda

Parallel algorithms for MDAnalysis

https://www.mdanalysis.org/pmda/

Other

31 stars 22 forks source link

balance block sizes (#71) #75

Closed orbeckst closed 5 years ago

orbeckst commented 5 years ago

Fixes #71

Changes made in this Pull Request:

add make_balanced_blocks() to util (implementing algorithm from https://github.com/MDAnalysis/pmda/issues/71#issuecomment-428772302)
use make_balanced_blocks() in parallel

PR Checklist

[x] Tests?
[x] Docs?
[x] CHANGELOG updated?
[x] Issue raised/referenced?

orbeckst commented 5 years ago

The make_balanced_blocks() function is not used yet and it does not yet know how to deal with step.

Help welcome on the PR.

codecov[bot] commented 5 years ago

Codecov Report

Merging #75 into master will increase coverage by 0.16%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #75      +/-   ##
==========================================
+ Coverage   97.92%   98.09%   +0.16%     
==========================================
  Files           8        8              
  Lines         386      419      +33     
  Branches       48       58      +10     
==========================================
+ Hits          378      411      +33     
  Misses          4        4              
  Partials        4        4

Impacted Files	Coverage Δ
pmda/parallel.py	`100% <100%> (ø)`	:arrow_up:
pmda/util.py	`100% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a718250...4a04788. Read the comment docs.

orbeckst commented 5 years ago

If I assume that the make_balanced_blocks() function gets the actual number of frames after applying start/stop/step

n_frames = len(u.trajectory[start:stop:step])
idx = make_balanced_blocks(n_frames, n_blocks, start=start, step=step)

then would the following give the correct indices into the trajectory

    bsizes = np.ones(n_blocks, dtype=np.int64) * n_frames // n_blocks
    bsizes += (np.arange(n_blocks, dtype=np.int64) < n_frames % n_blocks)
    bsizes *= step
    frame_indices = np.cumsum(np.concatenate(([start], bsizes)))

that can be used with

for start, stop in zip(idx[:-1], idx[1:]):
    for ts in u.trajectory[start:stop:step]:
        # work on frames in block

???

EDIT: yes, this is now the implementation in 5e9ddb96eb5632dc403f4569eccf709767046354

kain88-de commented 5 years ago

thanks @orbeckst