Open jbarnoud opened 8 years ago
You can do bool slicing, so maybe mask = [ts.time > 100 for ts in u.trajectory]
, for ts in u.trajectory[mask]
. But that isn't very smart as you're only interested in the first time you see the inequality operator change (assuming time only increases).
Yeah could definitely add it to the Base, probably as a separate set of 3 keywords?
I should try how slow/fast is the mask method. start_time
, stop_time
, and skip_time
would be fine by me.
@dotsdl was thinking about some of these things a while back – just pinging him in case he wants to chime in.
Time-aware slicing would be great. Perhaps also related to general "unit awareness" #596 ?
I think having a time slicer (something like Universe.trajectory.bytime[2.5:7.6]
) would be awesome, and we could take our cues from how pandas
implements floating point indexes (not easy, but possible). However, for efficient implementation with our Readers this would require e.g. the XTC reader to also get the time for each frame in addition to the offsets when it builds them.
This is more than a convenience, too. Speaking for myself I typically store intermediate datasets as pandas
DataFrames with the index being time, not frame. Being able to use this directly on the trajectory to go to frames of interest would be a big deal.
@dotsdl, I'm not so sure that storing XTC frame times is the best idea: some trajectories might have all frames with the same time, others might be the result of concatenation and have the same time period repeated.
Can't we just fallback on the same strategy that assigns a dt
if none is provided to the Reader
?
On 23-03-16 18:28, mnmelo wrote:
@dotsdl https://github.com/dotsdl, I'm not so sure that storing XTC frame times is the best idea: some trajectories might have all frames with the same time, others might be the result of concatenation and have the same time period repeated.
How does Gromacs analysis tolls deal with such trajectories?
Can't we just fallback on the same strategy that assigns a |dt| if none is provided to the |Reader|?
How do we currently deal with non homogeneous
dt
?— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/800#issuecomment-200453767
some trajectories might have all frames with the same time, others might be the result of concatenation and have the same time period repeated.
@mnmelo yeah these are definitely sticky spots that I don't have good solutions for.
Can't we just fallback on the same strategy that assigns a dt if none is provided to the Reader?
Is that even necessary? I'm generally not a fan of guessing schemes, but I'm not even aware at the moment what we currently do. I guess it depends on how the Reader interprets the data?
@jbarnoud, @dotsdl, I had a brief contact with this aspect when writing the frame time for the ChainReader
. Guessing based on initial frame time separation, and falling back to a default if the frames have the same time, isn't a terrible idea, as long as the behavior is well-documented.
Handling non-homogeneous dt
is a lot more work. If we want to also go that way I vote for a Reader
kwarg irregular_dt
or similar, that'll prevent these routines from guessing dt
and instead go through the frames' times. In that case I agree that storing times along with offsets becomes useful.
Jumping from one frame to the other when the dt
is homogeneous is just
a matter of a division to know what the new frame should be. When the
dt
is not homogeneous, then things becomes more complicated.
It is rare that the dt
changes, it is even rarer that the dt
changes
more than a few time throughout a trajectory. When going though the
trajectory to build the offset, we could save when the dt
changes.
Then, when we jump to an other frame, if the target frame is still in
the region with the same dt
, we can do as if the dt
was homogeneous,
if the dt
changes between the current frame and the target frame, we
have to do as many division as we have different dt
on the way
(usually 2, maybe 3?).
It should not be to much of an overhead, and it would avoid assuming
that the dt
is homogeneous.
On 23-03-16 19:14, mnmelo wrote:
@jbarnoud https://github.com/jbarnoud, @dotsdl https://github.com/dotsdl, I had a brief contact with this aspect when writing the frame time for the |ChainReader|. Guessing based on initial frame time separation, and falling back to a default if the frames have the same time, isn't a terrible idea, as long as the behavior is well-documented.
Handling non-homogeneous |dt| is a lot more work. If we want to also go that way I vote for a |Reader| kwarg |irregular_dt| or similar, that'll prevent these routines from guessing |dt| and instead go through the frames' times. In that case I agree that storing times along with offsets becomes useful.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/800#issuecomment-200473347
That approach sounds good. What happens if a trajectory has multiple segments with overlapping times?
Note that this discussion is also valid for what should be returned when asking for trajectory.ts.time
.
On 23-03-16 19:29, mnmelo wrote:
That approach sounds good. What happens if a trajectory has multiple segments with overlapping times?
We should be able to identify those (
dt
becomes negative for one frame), but I do not know what to do in that case.Note that this discussion is also valid for what should be returned when asking for |trajectory.ts.time|.
I would not try to outsmart the user. If the trajectory gives a time, we return that time. If the trajectory is odd, it is likely in purpose.
Also, having only one dt
per reader bugs me. But it may be an other issue.
On 23 Mar, 2016, at 11:24, Jonathan Barnoud wrote:
When going though the trajectory to build the offset, we could save when the
dt
changes.
Why not just build all the times when building the offsets and keep the times with the offsets?
@mnmelo I get that we could do fancy things with putting many trajectories together, but some basic attributes like that just need to report what was found in the data you provided.
As a way of iterating via time like the original question...
u = mda.Universe(top, [trj1, trj2])
actual_time = mda.guesstimate_time(u.trajectory) # new method to do fancy guessing? or method of CHAIN
# OR
actual_time = u.trajectory.guess_continuous_time()
mask = actual_time > 10000 & actual_time < 250000
for ts in u.trajectory[mask]:
# do stuff
@orbeckst Keeping all the times would make the search for a given frame slower as it requires to go through all the time saved until we find the one we want (or to bisect until there). This is in O(nframes) or O(log(nframes)).
If you save the changes in dt
in something like dt_changes = [(frame, time, dt), (frame, time, dt), ...]
a given frame can be found with:
for frame, time, dt in dt_changes:
if time <= target_time:
last_dt = dt
last_time = time
last_frame = frame
else:
break
target_frame = (target_time - time) / last_dt + last_frame
The complexity of this is O(nchanges + 1). As the number of dt
is likely to be very small, it is almost constant time.
EDIT: forgot to add the last frame in the snippet
When I run an analysis, I often discard the beginning of the trajectory, or I run it on a portion of the trajectory only. I think of these portions of trajectories in term of simulation time rather than number of frames. I discard the first 100 ns or I run the analysis on the last 5 µs; I do not discard the X first frame or run on the last Y frames. Also, I run my analyses with one frame every ns, not one frame every N ones.
It would be good if
AnalysisBase
could take times asstart
,stop
, andstep
. It would be even better if it could be done at the reader level.We should be careful that
dt
may not be constant throughout a trajectory.