MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.3k stars 647 forks source link

Add `.write` to trajectory reader #1037

Closed kain88-de closed 6 years ago

kain88-de commented 7 years ago

Like the ag.write it would be convenient to have a trajectory.write method. This method would always write the complete trajectory with all atoms. To write only a subset the more general Writer still has to be used. This makes conversion of trajectory formats or concatenating trajectories very easy.

# convert
mda.Universe(PDB).trajectory.write('traj_in_dcd.dcd')
# concatenate
mda.Universe(PSF, (DCD1, DCD2, DCD3)).write('concate.dcd')

It is also a good fit to write a MemoryReader back to disk.

Proposed API

All trajectory readers get a new method Reader.write() that allows one to write the trajectory to disk (possibly in a different format):

u.trajectory.write("new.dcd")

The semantics should evoke the idea that a trajectory is considered the object to be written.

Optional slicing kwargs

kwargs to determine which frames to write

To be used as

u.trajectory.write("new.dcd", start=5000, step=1000)

Optional atoms kwarg

What to do when a writer needs more information than available to the Reader https://github.com/MDAnalysis/mdanalysis/issues/1037#issuecomment-254771913 ? Add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

  1. If the writer has all information, write the trajectory; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
  2. If the writer needs additional information and...
    • no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
    • atoms is provided: write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea https://github.com/MDAnalysis/mdanalysis/issues/1037#issuecomment-254782831 .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

Basic implementation

Implementation could look similar to the following (without all the detailed error checking and checking for atoms and no ProgressMeter):

class BaseReader(...):
    ...
    def write(self, filename, atoms=None, start=None, stop=None, step=None):
        assert self == atoms.universe.trajectory
        with mda.Writer(filename, n_atoms=atoms.n_atoms) as W:
            for ts in self[start:stop:step]:
                 W.write(atoms)

History

richardjgowers commented 7 years ago

The headache I can see with this is for trajectory writers that require more than the Reader contains, (eg PDBWriter will want resnames, DCDReader doesn't have these). To make this work (with identical API) for all cases, Universe should be passed around as the 'thing' to be written. I'm fairly certain that Readers don't have a link to the Universe currently, so this will need to be added (in a way that doesn't cause circular reference problems with deleting).

kain88-de commented 7 years ago

Well we would also write a more generic write/copy function in the MDAnalysis namespace that wraps MDAnalysis.Writer

def write(ag, fname):
    with mda.Writer(fname, n_atoms=ag.n_atoms) as w:
        for ts in ag.universe.trajectory:
            w.write(ag)

# call it with
mda.write(mda.Universe(PSF, DCD), 'test.pdb))

That is easier to implement and would allow to write sub selections.

kain88-de commented 7 years ago

To be more specific and avoid confusions we could also name the generic function write_trajectory.

orbeckst commented 7 years ago

mda.write() seems like a cheap and rather useful function.

I still like the concise semantics of u.trajectory.write("new.xtc"). I haven't really thought it through but my feeling is we should avoid adding the Universe to the Reader. We could start by just raising an error if we have insufficient information and then tell the user to use MDAnalysis.write(universe, "new.xtc") instead. That would still be pretty clean.

kain88-de commented 7 years ago

We could start by just raising an error if we have insufficient information

The writers will already complain them self if a TimeStep object isn't enough with an exception. So that would just work.

richardjgowers commented 7 years ago

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop

orbeckst commented 7 years ago

The way that people will commonly access this method as is u.trajectory.write(). It really feels redundant to me to repeat "trajectory" in the method name again as u.trajectory.write_trajectory(). I think here we can take advantage of the OO design in that it also provides semantic clarity.

(And in any case, we are already overloading AtomGroup.write() to either write coordinates or selections.)

Oliver Beckstein email: orbeckst@gmail.com

Am Oct 20, 2016 um 3:15 schrieb Richard Gowers notifications@github.com:

I think write_trajectory might be a better name, Writer.write currently just does a single frame, it would be bad if trajectory.write got put in a for ts in u.traj loop

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

sseyler commented 7 years ago

I like u.trajectory.write("new.xtc") and agree with @orbeckst in that it's semantically clear and succinct. For a trajectory writing function in the MDA namespace, MDAnalysis.write_trajectory(universe, "new.xtc") makes sense to me.

orbeckst commented 7 years ago

Optional atoms kwarg

@richardjgowers asked what to do when a writer needs more information than available to the Reader https://github.com/MDAnalysis/mdanalysis/issues/1037#issuecomment-254771913. I suggest to add an optional kwarg atoms to Reader.write(filename, atoms=<AtomGroup-like>) that can be used to extract all the necessary additional information.

The behavior should be:

  1. If the writer has all information, write the trajectory and ignore atoms; if atoms is provided, only write those atoms as a subselection. This will allow for a very concise way to strip trajectories of water: u.trajectory.write("protein_only.xtc", atoms=u.select_atoms("protein")).
  2. If the writer needs additional information and...
    • no atoms provided: raise a ValueError and tell the user to supply the atomgroup or Universe in atoms (e.g., u.trajectory.write("new.xtc", atoms=u))
    • atoms is provided (and atoms.positions == Reader.ts.positions): write the trajectory for the given atoms.

Using the atoms argument to write subselections is in line with @kain88-de 's idea https://github.com/MDAnalysis/mdanalysis/issues/1037#issuecomment-254782831 .

Progress meter

And it would be nice if we could also have an optional progress meter #928 (quiet=False, default quiet=True... or verbose=False once we solve #903).

EDIT: Changed behavior to use atoms for subselections if provided.

orbeckst commented 7 years ago

I updated the original comment with the specs. Hopefully that makes it easier to implement.

Differences to what @kain88-de originally proposed

orbeckst commented 7 years ago

I am not sure if the following is a good idea but I'll throw it out nevertheless:

u.trajectory[start:stop:step].write("new.dcd")

It would mean overloading the Generator object. Probably not a very pythonic thing to do.

mnmelo commented 7 years ago

Conceptually, the simplicity and readability of that slice construct seem quite pythonic to me. I really like it! Then again, I'm not very shy when it comes to hacking the python object model...