MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

make_balanced_slices should raise error when n_blocks are larger than n_frames #137

Closed yuxuanzhuang closed 4 years ago

yuxuanzhuang commented 4 years ago

Expected behaviour

make_balanced_slices should raise error when n_blocks is larger than n_frames, or adjust n_blocks to n_frames.

Actual behaviour

No frame will be assigned to the last few slices, which in turn raises error during analysis._conclude.

Code to reproduce the behaviour

>>> make_balanced_slices(10, 12, start=0, stop=10)
[slice(0, 1, 1),
 slice(1, 2, 1),
 slice(2, 3, 1),
 slice(3, 4, 1),
 slice(4, 5, 1),
 slice(5, 6, 1),
 slice(6, 7, 1),
 slice(7, 8, 1),
 slice(8, 9, 1),
 slice(9, 10, 1),
 slice(10, 10, 1),
 slice(10, 10, 1)]

U = mda.Universe(TPR, XTC) #  10 frames
ow = U.select_atoms("name OW")
D = pmda.density.DensityAnalysis(ow, delta=1.0)
D.run(n_blocks=12)

/home/scottzhuang/pmda/pmda/parallel.py:362: UserWarning: run() uses more blocks than frames: decrease n_blocks
  warnings.warn("run() uses more blocks than frames: "
/home/scottzhuang/anaconda3/envs/gsoc/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-66762af15cd4> in <module>
----> 1 D.run(n_blocks=12)

~/pmda/pmda/parallel.py in run(self, start, stop, step, n_jobs, n_blocks)
    400                 # save the frame numbers for all blocks
    401                 self._blocks = _blocks
--> 402                 self._conclude()
    403         # put all time information into the timing object
    404         self.timing = Timing(

~/pmda/pmda/density.py in _conclude(self)
    303 
    304     def _conclude(self):
--> 305         self._grid = self._results[:].sum(axis=0)
    306         self._grid /= float(self.n_frames)
    307         metadata = self._metadata if self._metadata is not None else {}

~/anaconda3/envs/gsoc/lib/python3.8/site-packages/numpy/core/_methods.py in _sum(a, axis, dtype, out, keepdims, initial, where)
     45 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
     46          initial=_NoValue, where=True):
---> 47     return umr_sum(a, axis, dtype, out, keepdims, initial, where)
     48 
     49 def _prod(a, axis=None, dtype=None, out=None, keepdims=False,

ValueError: operands could not be broadcast together with shapes (124,85,62) (0,) 

### Currently version of MDAnalysis:
(run `python -c "import MDAnalysis as mda; print(mda.__version__)"`) 2.0.0 dev
(run `python -c "import pmda; print(pmda.__version__)"`) 0.3.0+17.g13fa3b5
(run `python -c "import dask; print(dask.__version__)"`) 2.19.0
orbeckst commented 4 years ago

Raising a ValueError would be the safe choice – I think users might not be happy with 1 task per frame anyway...