Open yuxuanzhuang opened 3 years ago
There are going to be some transformations that are not parallel-safe. It's great when we can make it work but that might not always be easy and require different algorithms.
Can we just add a boolean attribute to the TransformationBase class parallelizable = False
and then PMDA and friends can check? If we know that the Transformation can be run in parallel, we set it explicitly to True
.
Yes, I think this is precisely the case here. Position averaging is intrinsically history-dependent, and as such it'll not play nice with block parallelization.
@yuxuanzhuang let's add parallelizable = False
as an attribute to TransformationBase
and have derived classes change it if they can be parallelized with split-apply-combine/block parallelization.
I am not sure I agree with the way this was implemented. parallelizable
is now used as a kwarg
to the Analysis __init__
to indicate parallelization compatibility. I think it'd have been much more pythonic to instead have parallelizable
be a class attribute, since it should be a general characteristic of each Analysis, and not dependent on each instantiation.
Later, if the user wants to control parallelization from the instantiation/run of an Analysis, PMDA and friends will/should provide ways to force serial behavior.
What do you think? If you agree with a change, we're still in time to correct the API before 2.0.0
.
I agree it is more pythonic to have it as a class attribute but given we don't yet have a definite API for parallel analysis nor is this parallelizable
checked anywhere yet, it still feels less defined whether it is an internal indicator or something differs from instance to instance. For example, this parallelizable
only indicates the ability to use this Transformation in block analysis, but might be different in other parallel conditions, e.g. parallel analysis among ensemble simulations. How should we deal with that?
First of all apologies that I mistakenly exemplified with the Analysis case, not the Transformations.
Regarding parallelizable
I was assuming we were interpreting this as frame-wise 'split-apply-combine' parallelizability. It's perhaps best not to overload this single attr with other meanings.
Are there examples where the same transformation might be parallelizable or not, depending on intialization? I mean here the framewise parallelizability, but I guess we could discriminate multiple parallel possibilities if instead of a single attr we have this as a dict; i.e.: {'split-apply-combine': True, 'ensemble':True}
.
Expected behavior
The results are the same as serial analysis
Actual behavior
Due to the splitting approach, a new
PositionAverager
will be created for each block; no previous memory (self.coord_array
) is saved.Code to reproduce the behavior
Current version of MDAnalysis
python -c "import MDAnalysis as mda; print(mda.__version__)"
) 2.0.0-devpython -V
)? 3.8