MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.33k stars 653 forks source link

trajectory writer interface is inconsistent #206

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

The interface to our trajectory writers is inconsistent and should be fixed.

The Trajectory API is actually vague on what Writers have to accept. This should be tightened and synced with the docs to writer().

At a minimum

Opinions?

Original issue reported on code.google.com by orbeckst on 20 Jan 2015 at 7:02

GoogleCodeExporter commented 9 years ago

Original comment by orbeckst on 20 Jan 2015 at 7:03

GoogleCodeExporter commented 9 years ago

The testing for Writers is pretty haphazard right now, which is arguably how they've ended up diverging.

After the next release I can write some tests like I did for Timesteps where there's a mixin test for Writer and then each format.

Timestep tests: https://code.google.com/p/mdanalysis/source/browse/testsuite/MDAnalysisTests/test_coordinates.py?name=develop#2778

Original comment by richardjgowers on 22 Jan 2015 at 4:00

richardjgowers commented 9 years ago

So I've bumped into this a little when playing with the ncdf Writer (Issue #257). This was previously not deciding on whether to write velocities/forces on creation, but instead deciding when passed its first Timestep to write. So previously:

W = mda.Writer('out.ncdf', numatoms=100)
# at this point, W hasn't created the header of the file, and isn't commited to writing velocities

W.write(ts)
# once passed a Timestep, it looked if _velocities were present, then wrote the header with velocities and forces fields

And now:

W = mda.Writer('out.ncdf', numatoms=100, velocities=True, forces=True)

# W is now commited to writing velocities and forces

W.write(ts)  # this will now expect Velocities and Forces

Which I think gives much better control, eg you can choose to write a new trajectory without vels and forces if you want. Treatment of velocities and forces is something that needs defining in the API however.

orbeckst commented 9 years ago

Perhaps we can combine this by allowing

with the following meanings:

Writers that do not allow the per-Timestep decision should raise an exception (ValueError) on init if they get fed None instead of either True or False. The defaults would be set to something sensible...

Would this make sense?

mnmelo commented 9 years ago

Yes, you're right @orbeckst, TRR writers can assume nothing from the start (see #162). I didn't realize earlier I this discussion this could also be a problem.

The following is a common setup for GROMACS run outputs: Positions every x frames; Velocities every v frames; Forces every f frames; Where typically v and f are the same but larger than x (and a multiple of it).

In this case there'll be frames only with positions, and others with all three properties. The TRR format only writes what's there, i.e., if there's no velocities/forces only positions get written to file; no space is reserved for the missing properties.

A somewhat more uncommon, but equally possible and valid, situation is to have all three x v f steps different and not multiple of each other. In this case there might be frames with any combination of positions velocities and forces.

If the user wants to replicate such a behavior either: The writer is initialized with the step number for each property; Or it must be flexible enough to let arbitrary properties be set on the fly.

The second option is also the most faithful to the format, which doesn't even require any sort of regular spacing, and even allows for frames without any positions/velocities/forces.

richardjgowers commented 9 years ago

So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.

The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).
DCD & TRZ Writer used "dt" like this

TRR/XTC/NCDF used "delta" for time between frames.

I'm changing the "delta" writers to use "dt" instead.

orbeckst commented 9 years ago

I think somewhere in these long discussions we decided to use "dt" for the time between frames.

delta can then be one the data variables because it's not provided by all readers.

On 19 Jul, 2015, at 06:46, Richard Gowers wrote:

So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.

The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).

DCD & TRZ Writer used "dt" like this

TRR/XTC/NCDF used "delta" for time between frames.

I'm changing the "delta" writers to use "dt" instead.

Oliver Beckstein * orbeckst@gmx.net skype: orbeckst * orbeckst@gmail.com

richardjgowers commented 9 years ago

Yeah the problem is the Writers wanted differently named kwargs for this

richardjgowers commented 6 years ago

Ok, I'm raising this issue from the dead, we're (@micaela-matta) trying to write a TXYZ Writer... So currently Writer.write() expects either a Timestep, AtomGroup or Universe. For ascii formats especially, sticking to a Timestep is pretty lame for data, and also there's some weird hacks for Writers which need this.

Is it OK for Writers to just throw a failure if they get a Timestep and really can't make a sensible file from one? (ie a pdb file from just a timestep will be ugly)

kain88-de commented 6 years ago

Can't we just deprecate the Timestep? I find it cleaner to give a atomgroup or universe anyway.

richardjgowers commented 6 years ago

@kain88-de I'd prefer that yeah. Timestep is in AtomGroup, but not vice versa

orbeckst commented 6 years ago

ok