Open GoogleCodeExporter opened 9 years ago
Original comment by orbeckst
on 20 Jan 2015 at 7:03
The testing for Writers is pretty haphazard right now, which is arguably how they've ended up diverging.
After the next release I can write some tests like I did for Timesteps where there's a mixin test for Writer and then each format.
Timestep tests: https://code.google.com/p/mdanalysis/source/browse/testsuite/MDAnalysisTests/test_coordinates.py?name=develop#2778
Original comment by richardjgowers
on 22 Jan 2015 at 4:00
So I've bumped into this a little when playing with the ncdf Writer (Issue #257). This was previously not deciding on whether to write velocities/forces on creation, but instead deciding when passed its first Timestep to write. So previously:
W = mda.Writer('out.ncdf', numatoms=100)
# at this point, W hasn't created the header of the file, and isn't commited to writing velocities
W.write(ts)
# once passed a Timestep, it looked if _velocities were present, then wrote the header with velocities and forces fields
And now:
W = mda.Writer('out.ncdf', numatoms=100, velocities=True, forces=True)
# W is now commited to writing velocities and forces
W.write(ts) # this will now expect Velocities and Forces
Which I think gives much better control, eg you can choose to write a new trajectory without vels and forces if you want. Treatment of velocities and forces is something that needs defining in the API however.
Writer
instance.Timestep
can decide on its own what it wants to contain, we cannot do that. It has to depend on what is passed to the Timestep
– or @mnmelo might correct me here?Perhaps we can combine this by allowing
positions=True|False|None
velocities=True|False|None
forces=True|False|None
with the following meanings:
True
: "you must provide this for each Timestep
or I raise a ValueError
"False
: "you must not provide this for each Timestep
or I raise a ValueError
"None
: "you may provide this for each Timestep
and I will save it"Writers that do not allow the per-Timestep decision should raise an exception (ValueError
) on init if they get fed None
instead of either True
or False
. The defaults would be set to something sensible...
Would this make sense?
Yes, you're right @orbeckst, TRR writers can assume nothing from the start (see #162). I didn't realize earlier I this discussion this could also be a problem.
The following is a common setup for GROMACS run outputs: Positions every x frames; Velocities every v frames; Forces every f frames; Where typically v and f are the same but larger than x (and a multiple of it).
In this case there'll be frames only with positions, and others with all three properties. The TRR format only writes what's there, i.e., if there's no velocities/forces only positions get written to file; no space is reserved for the missing properties.
A somewhat more uncommon, but equally possible and valid, situation is to have all three x v f steps different and not multiple of each other. In this case there might be frames with any combination of positions velocities and forces.
If the user wants to replicate such a behavior either: The writer is initialized with the step number for each property; Or it must be flexible enough to let arbitrary properties be set on the fly.
The second option is also the most faithful to the format, which doesn't even require any sort of regular spacing, and even allows for frames without any positions/velocities/forces.
So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.
The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).
DCD & TRZ Writer used "dt" like this
TRR/XTC/NCDF used "delta" for time between frames.
I'm changing the "delta" writers to use "dt" instead.
I think somewhere in these long discussions we decided to use "dt" for the time between frames.
delta can then be one the data variables because it's not provided by all readers.
On 19 Jul, 2015, at 06:46, Richard Gowers wrote:
So this has cropped up with #350 and #308 with delta and dt being used interchangeably in different modules.
The official API (which is based on DCD) defines delta as the MD integrator step (~1fs) and dt as the time step between saved frames (~1ps).
DCD & TRZ Writer used "dt" like this
TRR/XTC/NCDF used "delta" for time between frames.
I'm changing the "delta" writers to use "dt" instead.
Oliver Beckstein * orbeckst@gmx.net skype: orbeckst * orbeckst@gmail.com
Yeah the problem is the Writers wanted differently named kwargs for this
Ok, I'm raising this issue from the dead, we're (@micaela-matta) trying to write a TXYZ Writer... So currently Writer.write()
expects either a Timestep
, AtomGroup
or Universe
. For ascii formats especially, sticking to a Timestep
is pretty lame for data, and also there's some weird hacks for Writers which need this.
Is it OK for Writers to just throw a failure if they get a Timestep
and really can't make a sensible file from one? (ie a pdb file from just a timestep will be ugly)
Can't we just deprecate the Timestep
? I find it cleaner to give a atomgroup or universe anyway.
@kain88-de I'd prefer that yeah. Timestep is in AtomGroup, but not vice versa
ok
The interface to our trajectory writers is inconsistent and should be fixed.
The Trajectory API is actually vague on what Writers have to accept. This should be tightened and synced with the docs to writer().
At a minimum
start
,stop
,step
,delta
in the same manner**kwargs
Opinions?
Original issue reported on code.google.com by
orbeckst
on 20 Jan 2015 at 7:02