MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.26k stars 639 forks source link

Change frame numbering from 1-based to 0-based #310

Closed dotsdl closed 8 years ago

dotsdl commented 9 years ago

Subtask for #252:

We want to make frame numbering start from 0 so that indexing with, say:

u = mda.Universe(GRO, XTC)
u.trajectory[0]

gives frame 0 instead of frame 1. This change will likely break lots of code that includes ts.frame + 1 and ts.frame - 1 and the like; will need to go through the code with a fine-toothed comb for these instances, using the tests as markers for trouble.

dotsdl commented 9 years ago

I've started this process on issue-310. The general strategy is:

dotsdl commented 9 years ago

I need help from someone familiar with the DCD pyrex code. I'm having trouble parsing it, so if someone wants to push a commit to this branch with the needed fixes to indexing there I'd be grateful.

richardjgowers commented 9 years ago

some changes to Reader's iters just got pushed so make sure you're working with them, it should make your life easier too.

Something that might trip you up, I've seen a lot of Readers using the fact that frame is 1 based to know the index of the next frame ( in 0 based)... I think PDB was the culprit, but then this was used as a template for many others.

dotsdl commented 9 years ago

Something that might trip you up, I've seen a lot of Readers using the fact that frame is 1 based to know the index of the next frame ( in 0 based)... I think PDB was the culprit, but then this was used as a template for many others.

Thanks for the tip. I've also noticed that many of them initialize ts.frame to 0 with the anticipation that when the first frame gets read it gets incremented. For now I'm going to try and solve this with initializing ts.frame to -1 and see how far this gets things. A cleaner solution may be possible.

orbeckst commented 9 years ago

@dotsdl , can you link to the cython/C DCD code that you need some insights on? The DCD reader itself uses the C code. The specialized DCD correl/timeseries uses a re-implementation of the DCD C code in cython... at least if memory serves (@nmichaud did all the hard work a long time ago...).

richardjgowers commented 9 years ago

So if everything is passing, all that needs to be done is making sure that the tests for each Reader are actually checking that:

The testing for coordinate Readers is pretty haphazard currently, there should really be some sort of overarching _TestReader which all Readers have to go through that includes this.. but that's probably another Issue to this.

dotsdl commented 9 years ago

@orbeckst I believe the relevant code is in MDAnalysis/package/src/dcd/dcd.c. Once I've finished checking that all the other Readers are behaving as expected (coordinate tests all pass, but need to see if they actually measure what's expected), I'll start poking at it myself. But if anyone else beats me to it, I'll be happy (ping @nmichaud ). It's probably one or two well-placed edits.

richardjgowers commented 9 years ago

Hey @dotsdl

I pushed what is probably the the commit I'm least proud of ever, but it should make DCD work! The tests work before and after my hack (I checked it manually, gives 0 based now)

I'm getting a fail in test_analysis... FAIL: Check for sustained resolution of Issue 188. Not sure if you get that too?

If you want I can write the tests I proposed above

dotsdl commented 9 years ago

Running the tests now. I just pushed the changes needed to make the current set of tests meaningful for 0-based DCD frame reading. It looks like this hack works; I've got no better solutions, even after poking the pyrex code for a little while now.

dotsdl commented 9 years ago

I'm about to start checking each reader test to ensure they do those tests you propose. If you want to check, too, I appreciate the help. You're welcome to start with the DCD reader. :D

orbeckst commented 9 years ago

FYI: DCD.c is pure C, no cython/pyrex. The Python objects are hand-coded. Do we need to change the underlying C code? (If we can keep it then it would make it somewhat easier to merge in changes from the upstream VMD molfile plugin sources... but admittedly, that's not high priority.)

dotsdl commented 9 years ago

We don't need to change it all, at least not with the fix @richardjgowers introduced, where we just decrement the frame number each time a frame is read.

orbeckst commented 9 years ago

Ok, good. Sorry, didn't have time to dig into the commit.

On 18 Jun, 2015, at 14:43, David Dotson wrote:

We don't need to change it all, at least not with the fix @richardjgowers introduced, where we just decrement the frame number each time a frame is read.

dotsdl commented 9 years ago

Okay, we now have 0-based frames for all readers except for the TRZ reader. Do TRZs store a frame number internally? I can't figure out where the tests are getting these frame numbers for that reader.

richardjgowers commented 9 years ago

@dotsdl Yeah the frame is stored in the trajectory, fixed now.

dotsdl commented 9 years ago

@richardjgowers awesome. We're almost there! I just realized I mispoke: the mol2 reader doesn't appear to update frame information at all on iteration. I noticed this in 610e300bbb81319dffc027d3ffd33a08edfde988 but failed to mention it in my last post. I'm pulling my hair out trying to figure out why this is the case. Ideas?

richardjgowers commented 9 years ago

Hehe, this one was pretty funny. MOL2 was creating a new Timestep object in each read, so was returning a Timestep with all the correct info. The problem was the tests were checking against the original Timestep (which wasn't touched in the read). So you were holding a pointer to a Timestep that MOL2 wasn't using any more..

I've fixed this so that MOL2 reuses the same Timestep throughout (in line with other Readers) and tests now pass

richardjgowers commented 9 years ago

It might be something worth writing a test for... something like

ts = u.trajectory.ts
ts2 = u.trajectory.next()
assert ts is ts2
dotsdl commented 9 years ago

Okay...there are just a few tests that are failing yet. One has to do with streamio on the XYZReader, which I don't know how to fix. There are also failures from test_reader_api.py that shouldn't be failing, but are. Strangely, for the _TestReader.test_context test, changing:

assert_equal(l, 0)

back to

assert_equal(l, 1)

results in the value of l changing! How???

dotsdl commented 9 years ago

In other words, for

assert_equal(l, 0)

we get

======================================================================
FAIL: test_context (MDAnalysisTests.test_reader_api.TestSingleFrameReader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alter/Library/mdanalysis/MDAnalysis/testsuite/MDAnalysisTests/test_reader_api.py", line 110, in test_context
    assert_equal(l, 0)
  File "/usr/lib/python2.7/site-packages/numpy/testing/utils.py", line 334, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 1
 DESIRED: 0

----------------------------------------------------------------------

But with:

assert_equal(l, 1)

we get

======================================================================
FAIL: test_context (MDAnalysisTests.test_reader_api.TestMultiFrameReader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/alter/Library/mdanalysis/MDAnalysis/testsuite/MDAnalysisTests/test_reader_api.py", line 110, in test_context
    assert_equal(l, 1)
  File "/usr/lib/python2.7/site-packages/numpy/testing/utils.py", line 334, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 0
 DESIRED: 1

----------------------------------------------------------------------

What is going on here?!

richardjgowers commented 9 years ago

For those tests, I wrote 2 "reader" classes at the top of the test, these will have to be changed to give 0 based frame numbers.

richardjgowers commented 9 years ago

Ok I've updated the Reader API tests to now expect and use 0 based frames throughout

dotsdl commented 9 years ago

Awesome. The only test we need to fix now is the test_XYZReader in test_streamio.py. @orbeckst, I noticed a note in the XYZReader(line 270 in coordinates/XYZ.py that indicates self._read_next_timestep() upon init is rather fragile. I'll defer to you on what the best fix here looks like.

richardjgowers commented 9 years ago

I think I've squashed it now. Reading numframes was messing with the file handle

dotsdl commented 9 years ago

Great! Running all tests now. If they all pass, mind if I squash, apply any missing :versionchanged: notes to the docs, and then submit the PR?

richardjgowers commented 9 years ago

Yeah if you could that would be awesome. I'm keen to get this through so I can finish #239

richardjgowers commented 9 years ago

Hmmm ok, so the fix I wrote works in 2.7, but not 2.6, because the bz2 doesn't support context management in 2.6? I'll rewrite the fix, but it shouldn't affect all the fun you're having writing docs

dotsdl commented 9 years ago

I'll rewrite the fix, but it shouldn't affect all the fun you're having writing docs

Heh. :D

dotsdl commented 9 years ago

So as not to disrupt your work, I've squashed the commits for this issue and pushed them to issue-310-fin. If you push your final bits to the original issue-310 branch, I'll happily merge and squash them into the new one.

orbeckst commented 9 years ago

Please check all the docs and update accordingly, e.g. http://www.mdanalysis.org/mdanalysis/package/doc/html/documentation_pages/coordinates/init.html#id13

dotsdl commented 9 years ago

All right. I'm back in business. I'll take care of this.

orbeckst commented 8 years ago

@dotsdl , is there anything else to be done here than the docs? If not then this should be an easy "close"...

dotsdl commented 8 years ago

Just the docs. Need to go through with a comb. Doing this today in addition to #309.

dotsdl commented 8 years ago

Closed in 5bdbb47b809d2b593225bbb1866577203fa84673