MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.26k stars 641 forks source link

inconsistent storage of `times` and `frames` attributes in `AnalysisBase` classes #4617

Open edisj opened 1 week ago

edisj commented 1 week ago

I'm pretty new to using AnalysisBase to write my own custom analysis classes, and I found it confusing that frames and times are handled differently in different classes.

Using the NewAnalysis example in the docs (https://docs.mdanalysis.org/stable/documentation_pages/analysis/base.html#MDAnalysis.analysis.base.AnalysisBase),

if you run NewAnalysis.run(), frames and times will be in NewAnalysis.frames and NewAnalysis.times.

If you use AnalysisFromFunction, then frames and times will be in results.frames and results.times, even though they both use AnalysisBase

Is this intended?

Expected behavior

I expected to find frames and times in results.frames and results.times in both cases

Actual behavior

results.frames throws AttributeError: 'Results' object has no attribute 'frames' when inheriting from AnalysisBase, but not when using AnalysisFromFunction

Current version of MDAnalysis

RMeli commented 1 week ago

Hi @edisj, times and frames are not strictly speaking results, so they are not part of the results attribute in AnalysisBase, but they are their own attribute.

In the AnalysisFromFunction they are also their own attribute, that you can access as usual:

rog = AnalysisFromFunction(radgyr, u.trajectory,
                           protein, protein.masses,
                           total_mass=np.sum(protein.masses))
rog.run()

print(rog.times, rog.frames)

So the .times and .frames attributes are consistent between the two classes.

However, in the AnalysisFromFunction, these attributes appear to be added to the .results attribute as well:

https://github.com/MDAnalysis/mdanalysis/blob/347a0c0d31ee850d5a41bad5a49803bc9795f580/package/MDAnalysis/analysis/base.py#L539-L540

I agree this inconsistency might be confusing.

I expected to find frames and times in results.frames and results.times in both cases

Since this is your expectation, would you find it better to have them in results also for AnalysisBase?

edisj commented 1 week ago

Hi @RMeli, thanks for the clarification! I ended up just adding them in my conclude method like this and it works exactly how I wanted

def _conclude(self):
    self.results.frames = self.frames 
    self.results.times = self.times 

Since this is your expectation, would you find it better to have them in results also for AnalysisBase?

Yes, personally I think .times and .frames should be part of the results

what I naturally think "times" and "frames" mean in the context of an analysis are the time values and frame numbers that correspond to the results of the analysis, so in some sense they are kind of like implicit results

The documentation from AnalysisFromFrunction:

Attributes
----------
results.frames : numpy.ndarray
    simulation frames used in analysis
results.times : numpy.ndarray
    simulation times used in analysis

is more in line with how I think it should behave

versus AnalysisBase:

Attributes
----------
times: numpy.ndarray
    array of Timestep times. Only exists after calling
    :meth:`AnalysisBase.run`
frames: numpy.ndarray
    array of Timestep frame indices. Only exists after calling
    :meth:`AnalysisBase.run`

What do you think?