MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.28k stars 642 forks source link

Add energy readers to the Auxiliary module #3714

Open BFedder opened 2 years ago

BFedder commented 2 years ago

Is your feature request related to a problem?

In molecular dynamics simulations, users frequently have to inspect energy-like terms such as potential or kinetic energy, temperature, or pressure. This is so common a task that even small inefficiencies add up. Currently, users have to create intermediate files from their MD simulation's output files to obtain plottable data, and this quickly becomes cumbersome when multiple terms are to be inspected. Being able to read in the energy output files directly would make this more convenient.

Describe the solution you'd like

I suggest that new energy file readers should be written as part of the auxiliary module. MDAnalysis' AuxReaders allow the association of non-trajectory data with trajectory timesteps and seem like a good fit for this purpose. In addition to reading the energy files and making their contents available to users without the need to write intermediate files, the framework would allow association of energy data to trajectories for further analyses.

I'm picturing an implementation of these new energy readers similar to the following (based on a hypothetical EDR file reader for GROMACS output)

# Data is read from files by Energy Readers
aux = MDAnalysis.auxiliary.EDR.EDRReader("ener.edr")

# The EDRReader populates an attribute to communicate which energy terms are present in the file
aux.terms

# Energy data can be added to trajectories one at a time
u = mda.Universe(foo, bar)
u.trajectory.add_auxiliary(auxname="epot", auxterm="Potential", auxdata=aux)

# Alternatively, adding multiple terms or all terms at the same time is possible
u.trajectory.add_auxiliary(auxname=["epot", "temp"], auxterm=["Potential", "Temperature"], auxdata=aux)
u.trajectory.add_auxiliary(auxterm=aux.terms, auxdata=aux)

# The loaded energy data can then be used, for example, to select frames based on criteria as such:
selected_frames = np.array([ts.frame for ts in u.trajectory if ts.aux.epot < some_threshold])

# This can then be used for trajectory slicing to yield trajectory subsets for further work
subset = u.trajectory[selected_frames]

# The energy readers can also be used to merely "unpack" energy data for plotting without assignment to trajectories
epot = aux.unpack("Potential")
bond_terms, angle_terms = aux.unpack(["Bond", "Angle"])

Overall, new AuxReaders for energy files produced by GROMACS, Amber, NAMD, or OpenMM might be interesting. OpenMM's StateDataReporters produce CSV files as output. Generalising this to a CSV AuxReader would be useful.

Moving away from energy files, a general NumPy array AuxReader would also be very interesting. It would allow great flexibility of the kind of data users could associate with trajectories, allowing for example the direct association of Analysis results like RMSD calculations to trajectories.

Additional context

EDR Reader Issue: #3629. NumPy Reader Issue: #3750

The user guide has a great explanation of how the aux system currently works.

orbeckst commented 2 years ago

The content of this issue already almost makes your first blog post. Perhaps add a section on how you plan to tackle the project, what you want to prioritize and why, and a timeline, and you have your first post!

Issue #3629 can be the "EDR" issue, which collects anything specific to the EDRReader. I would use this issue for more abstract discussions about the general user interface and as a way to link your project together.