MDAnalysis / pmda

Parallel algorithms for MDAnalysis
https://www.mdanalysis.org/pmda/
Other
31 stars 22 forks source link

Error when running `AnalysisFromFunction()` on more processes than frames #147

Open luponzo86 opened 3 years ago

luponzo86 commented 3 years ago

Expected behaviour

Successfully running AnalysisFromFunction() on all available CPUs by setting n_jobs=-1 even for very small trajectories.

Actual behaviour

A Warning is raised:

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py:360: UserWarning: run() uses more blocks than frames: decrease n_blocks
  warnings.warn("run() uses more blocks than frames: "
/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)

but the code runs anyway until an error is thrown:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

<omissis>

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/parallel.py in run(self, start, stop, step, n_jobs, n_blocks)
    398                 # save the frame numbers for all blocks
    399                 self._blocks = _blocks
--> 400                 self._conclude()
    401         # put all time information into the timing object
    402         self.timing = Timing(

/srv/home/lponzoni/anaconda3/envs/ifpe/lib/python3.7/site-packages/pmda/custom.py in _conclude(self)
    101 
    102     def _conclude(self):
--> 103         self.results = np.concatenate(self._results)
    104 
    105 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 10 has 1 dimension(s)

Code to reproduce the behaviour

I could not find MDs to run an example on (I had problems installing MDAnalysisTests, see issue #3084) but it basically happens when AnalysisFromFunction() is run on a trajectory with n frames and n_jobs is set to a value greater than n, or n_jobs = -1

This is not a big deal, but it was hard to debug and I wanted to report it.

Currently version of MDAnalysis: 1.0.0

pmda version: 0.3.0

luponzo86 commented 3 years ago

A quick fix would be to add the following check:

    # import trajectory
    u = mda.Universe(pdb_file, traj_file)

    # set number of parallel processes
    if n_jobs == -1:
        n_jobs = len(os.sched_getaffinity(0))
    # make sure that n_jobs is not greater than the actual number of frames
    n_total_frames = len(u.trajectory)
    n_actual_frames = len(range(
        start if start else 0,
        min(n_total_frames, stop) if stop else n_total_frames,
        step if step else 1))
    n_jobs = min(n_jobs, n_actual_frames)
orbeckst commented 3 years ago

Thank you.

orbeckst commented 3 years ago

@luponzo86 you could create a pull request with your check. We would review, guide you in adding tests, and you'd become an author of PMDA.

Development on PMDA is currently pretty slow because everybody is doing many other things (and in particular, there's a lot of work on MDAnalysis itself). Any help is greatly appreciated.