MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.33k stars 653 forks source link

Potential bug in Bond-Angle-Torsion coordinates (BAT) analysis class #4236

Open taoliu032 opened 1 year ago

taoliu032 commented 1 year ago

Hello! This issue may seem a bit long with all the code and outputs, but it's actually just one (simple-looking) issue. Please continue to scroll down, thanks!

Expected behavior

I was testing the BAT class for converting Cartesian coordinates to internal coordinates. The latter are rotation and translation invariant. To test the invariance, I did a simple test: select an AtomGroup, run BAT to get its internal coordinates; then rotate the AtomGroup, run BAT again to get the new internal coordinates. These two sets of internal coordinates should be the same.

Actual behavior

After rotating the AtomGroup and running BAT, I found the atomic coordinates were changed back to those before rotation! I re-ran the code several times and the issue remained. So I suspect there might be a bug to the BAT class. Please check the code and outputs below.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.analysis.bat import BAT # Bond-Angle-Torsion analysis

u_example = mda.Universe(top_file)
dna_atoms = u_example.select_atoms("chainID A")
print(dna_atoms.positions)

[[35.011 19.77 35.689] [35.678 20.01 34.452] [34.994 20.943 36.628] ... [46.193 25.35 41.895] [45.843 24.519 44.239] [44.311 25.289 48.263]]

R = BAT(dna_atoms)

# Calculate BAT coordinates
R.run()

<MDAnalysis.analysis.bat.BAT at 0x14185a710>

print(dna_atoms.positions)

[[35.011 19.77 35.689] [35.678 20.01 34.452] [34.994 20.943 36.628] ... [46.193 25.35 41.895] [45.843 24.519 44.239] [44.311 25.289 48.263]]

As expected, the atomic coordinates stay unchanged (as they should) after the first BAT operation. Then I rotate the molecule by an arbitrary angle, eg, 90 degrees:

# rotate the molecule by an arbitrary degree
angle = 90
p = [1,2,3]
d = [0,1,2]
dna_atoms = dna_atoms.rotateby(angle, axis=d, point=p) # directly update the AtomGroup

print(dna_atoms.positions)

[[-0.2750067 49.049965 21.049017 ] [-1.042872 49.19975 19.857126 ] [-0.90423685 49.64496 22.27702 ] ... [-2.490502 62.64985 23.245075 ] [-0.6989644 63.108196 24.944399 ] [ 0.41191477 63.501537 29.15673 ]]

As expected, the atomic coordinates are different after the rotation, from the original coordinates. Then I want to convert the new coordinates into internal coordinates:

# convert the rotated Cartesian coordinates to internal coordinates
R_rotated = BAT(dna_atoms)
R_rotated.run()

<MDAnalysis.analysis.bat.BAT at 0x1415d1790>

# Check the atomic coordinates in the same AtomGroup:
print(dna_atoms.positions) # why does BAT change the input AtomGroup back???....

[[35.011 19.77 35.689] [35.678 20.01 34.452] [34.994 20.943 36.628] ... [46.193 25.35 41.895] [45.843 24.519 44.239] [44.311 25.289 48.263]]

For some reason, after doing the second BAT operation, the atomic coordinates are changed back to those before applying rotation??? I don't understand what was going on... The API page for BAT class does not mention that this function alters the input AtomGroup. So please help if you have an idea of what the culprit(s) might be! Thanks a lot!!

Current version of MDAnalysis

lilyminium commented 1 year ago

Hi @taoliu032, welcome to MDAnalysis and thanks for opening such a detailed issue. I suspect that the reason you're seeing this is because MDAnalysis loads data for each frame from the file as you access it. That means that any changes you make to the coordinates for a particular frame (e.g. the first frame) are then re-set after you iterate through the trajectory. The BAT.run() (and any analysis class .run()) does just such an iteration, so the coordinates are re-set.

There are a couple ways you can get the rotation to persist. The first is to load the trajectory completely into memory when you first create the universe:

u_example = mda.Universe(top_file, in_memory=True)

The second is to use on-the-fly transformations. These mean you don't need to load all your data into memory at once, but rather applies a transform on the coordinates of the atoms when you load each frame. These are slightly more complex -- here's an example of how I would think it would look, but the code is untested.

from MDAnalysis.transformations.rotate import rotateby
angle = 90
p = [1,2,3]
d = [0,1,2]
transform = rotateby(angle, direction=d, point=p, ag=dna_atoms)
u.trajectory.add_transformations(transform)
taoliu032 commented 1 year ago

Hi @lilyminium, thanks a lot for the help! I tried the on-the-fly-transformation approach and it perfectly worked! Although I didn't test the first way because it is not practical for me to load the entire trajectory into memory. But I think it should work just as well :)

I didn't know that for a universe created by just a pdb (serving as both topology and coordinate files), I can still use .trajectory attribute. I guess such a universe is considered as a trajectory with just one frame, am I correct?

taoliu032 commented 1 year ago

A quick follow-up question about the BAT class before we close this issue. I noticed something else that might worth attention:

Once I got the internal coordinates, I tried to convert them back to Cartesian coordinates following the example like below:

R = BAT(dna_atoms)

# Calculate BAT coordinates
R.run()

# Convert them back to Cartesian coordinates
XYZ = R.Cartesian(R.results.bat[0,:])

I found if I run the R.Cartesian() consecutively by more than 1 time like this:

XYZ_1 = R.Cartesian(R.results.bat[0,:])
print(XYZ_1)
XYZ_2 = R.Cartesian(R.results.bat[0,:])
print(XYZ_2)

The returned XYZ_1 and XYZ_2 are different. Is there a reason for this?

I notice that in the API for Cartesian, it says "One application of this function is to determine the new Cartesian coordinates after modifying a specific torsion angle." I was wondering if "one application" implied that it should be used only once.

Thanks again!

richardjgowers commented 1 year ago

@taoliu032 you're right about single frame "trajectories" behaving like multiframe trajectories of length 1.

Your follow up looks like it might be that R.results.bat[0, :] is getting modified inside R.Cartesian. Can you check that R.results.bat[0, :] is identical before and after calling R.Cartesian?

taoliu032 commented 1 year ago

@richardjgowers Thanks for the clarification!

It doesn't look like R.results.bat[0, :] gets changed upon calling R.Cartesian on it:

bat = R.results.bat[0,:]
print(R.results.bat[0,:])

# Reconstruct Cartesian coordinates from BAT coordinates
XYZ = R.Cartesian(R.results.bat[0,:])
# print('XYZ:', XYZ)

print(R.results.bat[0,:])
print(np.allclose(bat, R.results.bat[0,:], atol=1e-6))

[41.76300049 23.33600044 43.16699982 ... 18.54352474 2.91510914 20.59402519] [41.76300049 23.33600044 43.16699982 ... 21.25134529 2.91510914 23.50913433] True

taoliu032 commented 1 year ago

I did some more tests on the issue that R.Cartesian(bat_frame) does not return consistent result when run twice consecutively. I find that torsions variable calculated in the source code is different in each run, probably because of the conversion step from any improper torsions to proper torsions, shown below:

        torsions = bat_frame[2 * n_torsions + 9:]
        # When appropriate, convert improper to proper torsions
        shift = torsions[self._primary_torsion_indices]
        shift[self._unique_primary_torsion_indices] = 0.
        torsions += shift
        # Wrap torsions to between -np.pi and np.pi
        torsions = ((torsions + np.pi) % (2 * np.pi)) - np.pi

Is it possible that self._primary_torsion_indices and self._unique_primary_torsion_indices can be altered, especially if the BAT object R contains results for a trajectory of N frames? @richardjgowers @anyone_else_who_has_thoughts :)

lilyminium commented 1 year ago

Ah, yes, this looks very likely a bug -- looks like line 506 is modifying the array in place. Even in your earlier comment @taoliu032, when the array gets printed, it looks different.

taoliu032 commented 1 year ago

Ah you're right @lilyminium -- I must've been temporarily blind for not seeing the difference! haha

[41.76300049 23.33600044 43.16699982 ... 18.54352474 2.91510914 20.59402519] [41.76300049 23.33600044 43.16699982 ... 21.25134529 2.91510914 23.50913433] True

Note: I was checking the source code of release-2.0.0

orbeckst commented 1 year ago

@daveminh here's a potential issue with the BAT module and the question if BAT.Cartesian() should really modify the input array in place. This leads to confusing behavior and it would be safer to work on a copy. However, is there a reason why you chose to do in-place operations? (@lilyminium identified line 506 as the culprit).