MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

Fail to assign trajectory to the last block #71

Closed VOD555 closed 6 years ago

VOD555 commented 6 years ago

Expected behaviour

Each block should be assigned trajectories.

Actual behaviour

Fail to assign trajectory to the last block when the number of total frames meets some condition.

Code to reproduce the behaviour

from MDAnalysisTests.datafiles import GRO_MEMPROT, XTC_MEMPROT
import MDAnalysis as mda
from pmda import rms

u=mda.Universe(GRO_MEMPROT, XTC_MEMPROT)
protein=u.select_atoms('protein') 
rmsd = rms.RMSD(protein, protein)
rmsd.run(n_jobs = n)

Whre XTC_MEMPROT is a trajectory which has 5 frames. It works when n= 1,2,3. If n=4, it will give an error:

ValueError: all the input array dimensions except for the concatenation axis must match exactly

Which is caused by the empty list returned from the last block.

To explain why the result from the last block is empty, we should first look at parallel.py. The lengh of the trajectory assigned to each block is determined by bsize = int(np.ceil(n_frames / float(n_blocks))).

If n_frames=5 and n_blocks=4, bsize=2, and the four blocks will be assiged 2, 2 , 1, 0 frames. No frame is assigned to the last one.

This issue happens when (a-1)n_blocks < n_frames <= a(n_blocks-1) for all possitive integer a < n_blocks.

orbeckst commented 6 years ago

We should have a separate function that figures out how the trajectory is split. Then we can call this function. We can also test this function separately.

Maybe there's a simple algebraic way to do it. Otherwise I would do for

m[i] = M // N     # initial frames for block i
r = M % N         # remaining frames 0 ≤ r < N
for i in range(r):
    m[i] += 1     # distribute the remaining frames over the first r blocks

which can be written more succinctly as

m = np.ones(N) * M//N
m += (np.arange(N) < M%N)

For the M=5, N=4 example this should give m = [2, 1, 1, 1].