MolSSI / mmelemental

Python implementation of the MMSchema specification. Provides fundamental models for molecular mechanics.
https://molssi.github.io/mmelemental
BSD 3-Clause "New" or "Revised" License
10 stars 3 forks source link

Support for Trajectories #2

Open anabiman opened 4 years ago

anabiman commented 4 years ago

ATM the Molecule model stores atomic information for a single frame. Generalizing this to a Trajectory class that stores an array of Molecule would be inefficient especially for classical MD simulation in which the topology remains static. Therefore, I'm leaning towards refactoring the Molecule model into 2 by separating the topology from the atomic attributes (coordinates, velocities, etc.) which would then support time series. This would require either dropping qcelemental's Molecule model as a base, or asking @bennybp to change QCSchema.

The 3 core models would then become:

Pinging @IAlibay @lilyminium @fiona-naughton

anabiman commented 4 years ago

Should velocities and forces even be part of the MM Molecule definition? Maybe another way of thinking about this is to retain mmelemental.Molecule as a subclass of qcelemental.Molecule and use the geometry Field to store the coordinates, while velocities/forces are shifted to a Trajectory class. In other words, mmelemental.Molecule would never store any time-dependent data. The Trajectory class would look something like this:

class Trajectory:
     topology: Union[Molecule, Array[Molecule]]
     coordinates: Union[Array[float], Array[Array[float]]]
     velocities: Union[Array[float], Array[Array[float]]]
     forces: Union[Array[float], Array[Array[float]]]

This would be an efficient implementation for systems with static topologies. However, if someone wants to instantiate mmelemental.Molecule from a .gro file which can store velocities and/or forces, the conversion would be either (1) lossy or (2) require the use of Trajectory class instead.

In the context of writing MMIC translators for MMSchema-MDAnalysis, this problem would then become converting MDAnalysis.Universe to/from mmelemental.Trajectory. Is this awkward? I think not, because MDAnalysis.Universe is implicitly a trajectory class. This means we need 2 types of translators, one for topology & another for trajectory.

IAlibay commented 4 years ago

So if I'm understanding this correctly, you would be proposing two separate classes, one that handles purely single frame inputs, and one that handles trajectories?

I would suggest instead having Trajectory be a generator method that Molecule can be hooked up to, which feeds time-dependent (or in the case of RDKIT just different conformations) from the "input object". This way you can update coordinates over time and avoid the issue of storing everything in memory (otherwise I think you'll very quickly get into OOM errors).

I realise this is very much exactly how MDAnalysis handles things (so my viewpoint is probably biased), where you have a "static" Universe object that contains all the system information and then a timstep object that updates the system coordinates as you traverse through the trajectory (slight simplification of things, but close enough).

Edit; this way, you can have everything be a class that derives from Molecule and on implementation of a converter you can define if this is going to be a singleframe object (which doesn't need the trajectory generator) or a multiframe object.

anabiman commented 4 years ago

You've raised many good points so let me tackle them one by one:

class Trajectory:
     topology: Union[Molecule, Array[Molecule]]
     coordinates: Union[Array[float], Array[Array[float]]]
     velocities: Union[Array[float], Array[Array[float]]]
     forces: Union[Array[float], Array[Array[float]]]

     def from_file(self, file: FileInput, load_all: bool):
            ...