MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.31k stars 648 forks source link

define method calc_representative() for averages of auxiliary data in the AUXReader #3830

Open BFedder opened 2 years ago

BFedder commented 2 years ago

Is your feature request related to a problem?

The AuxReaders have a calc_representative method which returns a representative value of the auxiliary data for any given time step. It supports two modes, either returning the auxiliary data of the AuxStep closest to the trajectory time step, or an average of AuxSteps within a certain cutoff around the trajectory time step.

With the implementation of an AuxReader for EDR files in PR #3749, a new way to store auxiliary data within the readers was needed. Previously, data was stored in plain NumPy arrays (for example for the XVG reader). Now, a dictionary of NumPy arrays is also possible. These two structures require different treatment to obtain average values. This issue is opened to facilitate the discussion on where the relevant method(s) should be defined.

Possible Solutions

Current implementation: Define method in auxiliary.base.

In this solution, calc_representative handles both cases correctly via a try/except statement to distinguish between the plain NumPy array and the dictionary of arrays. It works, but it is somewhat inelegant. @orbeckst and @IAlibay have rightfully pointed out that while this solution is fine if there is only two internal data structures, the solution will quickly become unmanageable if and when more cases need to be handled.

Treat plain NumPy array as base case, override in EDRReader

In an alternative solution, the calc_representative method would be reverted to its state pre-EDRReader. It fully supports the XVGReader and other potential readers that store data in plain NumPy arrays. Other readers like the EDRReader would need to override this method to work properly.

Force implementation in each AuxReader, raise NotImplementedError in base class

Finally, the methods for calculating average values could also be a requirement in each Reader and moved out of the base class completely.

Additional context

For more context see #3749, in particular orbeckst's comment

orbeckst commented 2 years ago

The situation might become clearer once we have more aux readers. I think we should be guided by reducing code duplication, having clear code paths, and maintainability.

Currently I’m leaning towards having the simple array method in base and anything more complicated should provide its own. However, this might come with a cost for developers who might think that they can only use the simple array-type implementation.

hmacdope commented 2 years ago

My instinct is to have the default behaviour be a "NumPy-like" behaviour in a base class with an override in each AUXReader for each specific case.

However I totally agree with @orbeckst that this will become clearer as more AuxReaders are written. I am ok with it being the way it is for now.

IAlibay commented 2 years ago

Because adding new readers is probably going to be a slow process I'd just go for the NotImplementedError in the short term and review it in the long term.

This avoids semver level issues where you make an early decision and then a year in realise it's sub-optimal and have to go back and fix things in the next major version.

orbeckst commented 2 years ago

I am swayed by https://github.com/MDAnalysis/mdanalysis/issues/3830#issuecomment-1250411497 : under semantic versioning it's hard to remove a method from a base class. Let's start out with each AUXReader implementing their own and the base class raising NotImplementedError.

Priyanshuken18 commented 1 year ago

i believe that The AuxReaders module in MDAnalysis has a method called calc_representative() which returns a representative value of the auxiliary data for any given time step. [It supports two modes, either returning the auxiliary data of the AuxStep closest to the trajectory time step, or an average of AuxSteps within a certain cutoff around the trajectory time step]