bowman-lab / enspara

Modeling molecular ensembles with scalable data structures and parallel computing
https://enspara.readthedocs.io
GNU General Public License v3.0
33 stars 16 forks source link

Fix for CARDS angle=360 issue. #236

Closed lgsmith closed 4 months ago

lgsmith commented 4 months ago

One line patch to guarantee that no frame can return 360.0 degrees for the purposes of CARDS rotamer calculation. When the wrapping frame gets changed in the transformation from -180 to 180 to 0-360 degrees, the 'open' end of the range needs to be redefined to be 360, whereas before it was likely either -180 or 180, and zero, which is the value that becomes 360, is totally permissible. Because of downstream happenings in digitization of values, this creates indexing issues in the bins when numbers incredibly close to zero have 360 added to them, because they get rounded up due to finite floating point precision.

The fix is to set 360 degrees to zero after doing the conversion but before bins are determined. On a test trajectory that bugs out with the previous code this modification works.

lgsmith commented 4 months ago

I thought if it hit 359.99roundup it should be classified as a zero. Ill revert it if you’re sure that’s incorrect.

On Thu, Jul 18, 2024 at 1:11 PM Sukrit Singh @.***> wrote:

@.**** commented on this pull request.

Huzzah at long last a fix to capture this edge case! I left one small comment based on the discussion we had about how to capture cases where angles may be something like 359.9999998 but that should be a relatively quick addition as well.

thanks for this!

In enspara/geometry/rotamer.py https://github.com/bowman-lab/enspara/pull/236#discussion_r1683216326:

@@ -14,6 +14,7 @@ def dihedral_angles(traj, dihedral_type):

transform so angles range from 0 to 360 instead of radians or -180 to 180

angles = np.rad2deg(angles) angles[np.where(angles < 0)] += 360

  • angles[np.where(angles == 360)] = 0

Just from a previous discussion, wanted to note that there may be cases where MDtraj won't return 360.0 exactly but rather something like like 359.9999998, which could either

  1. Round up to 360, sneak past this == check, and cause an error,
  2. Be caught by the == and be set to 0, which is actually going to cause that angle to be placed in the wrong basin.

I think the workaround idea we had was to add something like angles[np.where(angles > 359.5)] = 359.5 prior to this line, which would set all the angles not at 360 to 359.5 to ensure they get assigned to the correct rotameric basin.

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/pull/236#pullrequestreview-2186439769, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYKAW3VBT7XNEZ4MOULZM7ZLRAVCNFSM6AAAAABK5DQSDOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOBWGQZTSNZWHE . You are receiving this because you authored the thread.Message ID: @.***>

sukritsingh commented 4 months ago

I thought if it hit 359.99roundup it should be classified as a zero.

Ultimately it's a "design" decision, so I'm open to discussion on it: In my mind 359.99 is distinct from 360.0 and would be classified as a different rotameric state, so while 360.0 should be set to 0, I don't know if we want to keep 359.99 to be "rounded up to zero" (and assign it to rotameric state 0) or we use some kind of ceiling on it (ie set it to 359.5, and assign it to rotameric state 2).

Open to discussions on this! I think this may be a rare enough case that hopefully doesn't perturb too much either way.

lgsmith commented 4 months ago

What if we got rid of the line I added above, and instead did:

angles[np.where(angles < 0)] += 360 - np.finfo(angles[0])

I believe this is better but would really appreciate input to make sure I'm thinking about the code's use of 'angles' correctly.

lgsmith commented 4 months ago

Actually, I tested it and that won't work. Sorry. I think the thresholding thing is probably the right way to go for now, since the rotamer bins defined in the code have a devider between zero and 360.

sukritsingh commented 4 months ago

What if we got rid of the line I added above, and instead did: angles[np.where(angles < 0)] += 360 - np.finfo(angles[0])

Yeah to be honest I'd have to think more deeply about what np.finfo does to know if it's a good idea (and I'm hesitant to muck with critical code like that anyways without also sitting down and reminding myself about angles). IIRC angles is expected to be a 2D array representing one set of dihedrals from a single trajectory at a time (ie all the φ angles from a single trajectory).

the rotamer bins defined in the code have a devider between zero and 360.

Yeah this was a old decision by me about how to transform these rotamers so that bins are continuous despite periodicity. Soon after I realized I should probably have used a sine/cosine transformation but that's a bigger overhaul, lol.

The changes LGTM! I'm assuming you tested the additional line on your "problem" case and it still worked fine?

lgsmith commented 4 months ago

The first and second approach both worked for my pathological traj. I reverted to thresholding for now.

FWIW, I think that converting the bins to $-\pi$ to $\pi$ is an easier and more sensible fix, but because they're defined a lot of places I was less confident I could affect a quick fix that would flow through all the rotamer code well. Converting to sine and cosine could also work, for sure. But I think this is 'fine for now' so I'd suggest merging these changes and doing a more full revamp at some future point.

sukritsingh commented 4 months ago

FWIW, I think that converting the bins to $-\pi$ to $\pi$ is an easier and more sensible fix, but because they're defined a lot of places I was less confident I could affect a quick fix that would flow through all the rotamer code well.

IIRC the issue with setting the bins in the periodic $-\pi$ to $\pi$ range is that then one of the rotameric states then straddles the $\pi$ and $-\pi$ boundary (I believe trans rotamers can be both positive and negative IIRC). We'd probably have to update the whole binning scheme at that point to a different approach, which isn't a bad idea but a bigger overhaul. The last time a new binning scheme was discussed, Rohit Pappu brought up the idea of using established rotameric libraries to define the rotamer states. That's been the "best" idea so far for an overhaul of the bins system.

and sounds good!