MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.31k stars 648 forks source link

WaterBridgeAnalysis can't seem to do (Acceptor <-> WAT <-> WAT <-> Acceptor) water bridges? #4040

Closed pwnusmaximus closed 1 year ago

pwnusmaximus commented 1 year ago

Expected behavior

I want to quantify a second order water bridge (two waters between a protein residue (resid 43 and name OD2) and a DNA nucleotide (resid 230 and name P). I would like a time series of the presence of these bonds.

resid 43 = resname ASP resid 230 = resname DA

Troubleshooting thus far

Notably, selection two is a phosphorus atom. (not a usual hydrogen bond acceptor atom) I know this 'P' isn't in any standard library so I added it with the "acceptors" keyword in addition to forcing selection2_type to be an acceptor. I tried troubleshooting by enlarging the acceptable ranges for angle and distance, and even manually specifying every donor and acceptor used in the interaction in-case any were not in the default atom mask libraries. I also played with using resid, resname or simply atom masks to see if the selection was the issue. I tried on a full trajectory and truncated trajectories where I've manually extracted the frames with water bridges present.

I've verified visually in pymol that the water bridge is indeed present and within spec for the identities, angles and distances I used.

Actual behavior

MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: selection = 'resname DA and name P' (update: True)
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: water selection = 'resname WAT' (update: True)
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: criterion: donor heavy atom and acceptor atom distance <= 4.000 A
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: criterion: angle D-H-A >= 120.000 degrees
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: force field CHARMM27 to guess donor and         acceptor names
MDAnalysis.analysis.base: INFO     Choosing frames to analyze
MDAnalysis.analysis.base: INFO     Starting preparation
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: initial checks passed.
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: starting
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     Starting analysis (frame index start=0 stop=10, step=1)
MDAnalysis.analysis.base: INFO     Starting analysis loop over 10 trajectory frames
100%|██████████| 10/10 [00:00<00:00, 21.44it/s]
MDAnalysis.analysis.base: INFO     Finishing up
[[], [], [], [], [], [], [], [], [], []]

you can see on the last line no hydrogen bonds are identified. I've opened up the 'w' dataframe and confirmed that the universe was created successfully and that the correct acceptors and donors are listed.

Code to reproduce the behavior

import MDAnalysis as mda
import MDAnalysis.analysis.hydrogenbonds
import numpy.linalg
import numpy as np
np.set_printoptions(linewidth=100)

#%%
#Set working directory
import os
os.chdir('c:\\Users\\makay.murray\\Documents\\GitHub\\md_water_bridge\\analysis_dir')
[MDAnalysis.log](https://github.com/MDAnalysis/mdanalysis/files/10827603/MDAnalysis.log)

#Create Universe
parm = "../trajectory_data/2w35-DL-HID.prmtop"
traj = "../trajectory_data/2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u, 
                                                          'resname ASP and name OD2', 
                                                          'resname DA and name P', 
                                                          update_selection=True,
                                                          selection1_type='acceptor',
                                                          selection2_type='acceptor',
                                                          water_selection='resname WAT', 
                                                          order=2, 
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          distance_type = 'heavy',
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('H1', 'H2'),
                                                          filter_first=True
                                                          )
w.run()

print(w.timeseries)
....

Current version of MDAnalysis

Attached files: 2w35-DL-HID-rep2-500ns-10frames.zip 2w35-DL-HID.zip Water Bridge Image black MDAnalysis.log

pwnusmaximus commented 1 year ago

While looking through the MDAnalysis.log file it appears that selection2_type is ignored when selection1_type is set. ie. It appears that selection 2 and selection 1 are inverse of each other. For most cases this makes sense. However, for second order water bridges an Acceptor <-> WAT <-> WAT <-> Acceptor is perfectly reasonable.

Perhaps there is a method to overrule this behaviour and I didn't do it properly.

orbeckst commented 1 year ago

@xiki-tempula as you're the orginal author of the wbridge code, can you have a look at this question/issue please? Thanks!

xiki-tempula commented 1 year ago

@Pwnus Thanks for the issue. I will have a look in the weekends. I wonder why do you do distance_type = 'heavy'?

pwnusmaximus commented 1 year ago

Hi @xiki-tempula, for me distance_type = 'heavy' is just a convention. Our usual analysis software of choice ,cpptraj from Ambertools2021, defaults to heavy atom distances and I've grown accustomed to it as a measure.

xiki-tempula commented 1 year ago

I can reproduce the problem. Let me have a look.

xiki-tempula commented 1 year ago

Ok, I found out why this failed. selection1_type is a keyword argument, but selection2_type is not a keyword argument.

        selection1_type : {"donor", "acceptor", "both"} (optional)
            Selection 1 can be 'donor', 'acceptor' or 'both'. Note that the
            value for `selection1_type` automatically determines how
            `selection2` handles donors and acceptors: If `selection1` contains
            'both' then `selection2` will also contain 'both'. If `selection1`
            is set to 'donor' then `selection2` is 'acceptor' (and vice versa).
            ['both'].

So the code only checks for three cases, When selection1_type='donor', it assumes selection1_type='donor', selection2_type='acceptor'. When selection1_type='acceptor', it assumes selection1_type='acceptor', selection2_type='donor'. When selection1_type='both', it assumes selection1_type='both', selection2_type='both'.

So when it sees selection1_type='acceptor', selection2_type='acceptor', it just reverts to selection1_type='acceptor', selection2_type='donor'.

xiki-tempula commented 1 year ago

Ok, there is another thing. Sorry if the doc is not very clear but donors is actually the heavy atom associated with the donor. Sorry the distance_type = 'heavy' is not very well maintained mainly because I originally don't understand how this works and kind of inherit the code from other part of the MDA.

However, the distance_type='hydrogen' does work. In this case, it would be

import MDAnalysis as mda
#Create Universe
parm = "2w35-DL-HID.prmtop"
traj = "2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u,
                                                          'resname ASP and name OD2',
                                                          'resname DA and name P',
                                                          update_selection=True,
                                                          water_selection='resname WAT',
                                                          order=2,
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('O'),
                                                          filter_first=True
                                                          )
w.run()
print(w.timeseries)

Gives [[(1817, 4251, ('ASP', 110, 'OD2'), ('WAT', 351, 'H2'), 3.393591355721304, 133.6406052508985), (4251, 4252, ('WAT', 351, 'H2'), ('WAT', 352, 'O'), 3.1510935551442802, 129.3098825714121), (4253, 3774, ('WAT', 352, 'H1'), ('DA', 230, 'P'), 2.9032325126961362, 142.4088206311063), (720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 1.8726619038368872, 132.56006363521206), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 1.850974019502122, 150.58244069848672)], [(1817, 4253, ('ASP', 110, 'OD2'), ('WAT', 352, 'H1'), 1.849568207751335, 172.96529196011156), (4254, 9151, ('WAT', 352, 'H2'), ('WAT', 1985, 'O'), 1.8087280419588374, 148.28580215335526), (9153, 3774, ('WAT', 1985, 'H2'), ('DA', 230, 'P'), 2.246521446223992, 167.5724465796556), (720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 3.429838909172761, 139.9579791603104), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.8959299736466242, 171.89039279707592), (720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 1.8671204264943673, 153.31218009273726), (720, 14307, ('ASP', 43, 'OD2'), ('WAT', 3703, 'H2'), 3.8169042950036887, 144.42441139786533)], [(720, 4257, ('ASP', 43, 'OD2'), ('WAT', 353, 'H2'), 3.64963647552059, 139.45071906498148), (4256, 3774, ('WAT', 353, 'H1'), ('DA', 230, 'P'), 2.834134138868142, 129.92633270270065), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.818111829419461, 161.60502328555168), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.015430888351284, 136.59739586186618)], [(720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.8203121752801656, 158.58513345751518), (9151, 4293, ('WAT', 1985, 'O'), ('WAT', 365, 'H2'), 3.479766986115513, 133.83175881878554), (4293, 3774, ('WAT', 365, 'H2'), ('DA', 230, 'P'), 3.060998864481397, 140.26838790231153), (9153, 4291, ('WAT', 1985, 'H2'), ('WAT', 365, 'O'), 3.608196269337274, 122.48698663619565)], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.843827286086128, 144.45964232357517), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 2.8939357380099944, 134.56966096070747), (4292, 5590, ('WAT', 365, 'H1'), ('WAT', 798, 'O'), 3.1605338845771804, 138.05900583208216), (720, 5591, ('ASP', 43, 'OD2'), ('WAT', 798, 'H1'), 3.9242542859284426, 149.26346158504964), (720, 13614, ('ASP', 43, 'OD2'), ('WAT', 3472, 'H2'), 1.7618811985131957, 154.11506898283204)], [(720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 1.892213094315617, 158.488408366625), (9151, 16995, ('WAT', 1985, 'O'), ('WAT', 4599, 'H2'), 1.957615569970479, 165.2898111917652), (16994, 3774, ('WAT', 4599, 'H1'), ('DA', 230, 'P'), 2.7888123616948706, 134.82884905540615)], [(1817, 4106, ('ASP', 110, 'OD2'), ('WAT', 303, 'H1'), 1.833734411648239, 148.7257072909245), (4107, 9151, ('WAT', 303, 'H2'), ('WAT', 1985, 'O'), 2.4095311772259493, 122.19942481564095), (9152, 3774, ('WAT', 1985, 'H1'), ('DA', 230, 'P'), 3.5913188516541963, 146.17430132993326), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 1.8416829164441975, 152.57002058611997), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.8229850078234018, 177.70100450495477)], [(1817, 4106, ('ASP', 110, 'OD2'), ('WAT', 303, 'H1'), 1.740858427970088, 151.83055523471987), (4107, 6952, ('WAT', 303, 'H2'), ('WAT', 1252, 'O'), 3.445888301940925, 140.5532946438981), (6953, 3774, ('WAT', 1252, 'H1'), ('DA', 230, 'P'), 3.2492178833565544, 141.50511711223422), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.951102535539977, 158.61324639824332), (720, 6954, ('ASP', 43, 'OD2'), ('WAT', 1252, 'H2'), 1.9869096728458162, 155.66059556502225), (720, 13515, ('ASP', 43, 'OD2'), ('WAT', 3439, 'H2'), 3.9463436426809833, 132.2588781538232)], [(720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.2405464382157216, 122.84506810889205), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.1773631323951115, 126.97784464396653), (4291, 6954, ('WAT', 365, 'O'), ('WAT', 1252, 'H2'), 3.748528653581033, 123.84650122269805), (720, 6954, ('ASP', 43, 'OD2'), ('WAT', 1252, 'H2'), 3.5016609783952184, 167.98769036196092), (1817, 9152, ('ASP', 110, 'OD2'), ('WAT', 1985, 'H1'), 1.9499123859961573, 145.10239635368376), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.5926231286622126, 175.76655312759172)], []]

xiki-tempula commented 1 year ago

This is also fixed in https://github.com/MDAnalysis/mdanalysis/pull/4066

import MDAnalysis as mda

#Create Universe
parm = "2w35-DL-HID.prmtop"
traj = "2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u,
                                                          'resname ASP and name OD2',
                                                          'resname DA and name P',
                                                          update_selection=True,
                                                          water_selection='resname WAT',
                                                          order=2,
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          distance_type = 'heavy',
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('O'),
                                                          filter_first=True
                                                          )
w.run()

print(w.timeseries)

[[(720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 2.6173148105009867, 132.56006363521206), (4253, 3774, ('WAT', 352, 'H1'), ('DA', 230, 'P'), 3.7078930697486436, 142.4088206311063), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.7255888461800195, 150.58244069848672)], [(1817, 4253, ('ASP', 110, 'OD2'), ('WAT', 352, 'H1'), 2.8022313402893086, 172.96529196011156), (4254, 9151, ('WAT', 352, 'H2'), ('WAT', 1985, 'O'), 2.670479345137013, 148.28580215335526), (9153, 3774, ('WAT', 1985, 'H2'), ('DA', 230, 'P'), 3.1885889101394085, 167.5724465796556), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.846776898715446, 171.89039279707592), (720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 2.756608767156218, 153.31218009273726)], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.742949075816077, 161.60502328555168), (4291, 4257, ('WAT', 365, 'O'), ('WAT', 353, 'H2'), 2.9116107495969175, 160.3826600196096), (4256, 3774, ('WAT', 353, 'H1'), ('DA', 230, 'P'), 3.525871616409098, 129.92633270270065), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.789022173597509, 136.59739586186618)], [], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.680570453923368, 144.45964232357517), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.6304643721412186, 134.56966096070747), (720, 13614, ('ASP', 43, 'OD2'), ('WAT', 3472, 'H2'), 2.6558193872943874, 154.11506898283204)], [(720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 2.8048798451527235, 158.488408366625), (9151, 16995, ('WAT', 1985, 'O'), ('WAT', 4599, 'H2'), 2.8941625694269955, 165.2898111917652), (16994, 3774, ('WAT', 4599, 'H1'), ('DA', 230, 'P'), 3.5297488120647786, 134.82884905540615)], [], [], [(720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.873803650917737, 122.84506810889205), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.829556436424445, 126.97784464396653), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.549059271453923, 175.76655312759172), (1817, 9152, ('ASP', 110, 'OD2'), ('WAT', 1985, 'H1'), 2.7889963607587984, 145.10239635368376)], []]

pwnusmaximus commented 1 year ago

Thank you xiki-tempula

I've made notes to myself for that little "gotcha" to look out for when running this type of analysis and the correct options to ask for. Thank you for the solution :)

It'll be highly valuable to me and my lab-mates.

xiki-tempula commented 1 year ago

@pwnusmaximus Please don't close the issue yet as we still need to merge the fix for the distance_type = 'heavy'

pwnusmaximus commented 1 year ago

@pwnusmaximus Please don't close the issue yet as we still need to merge the fix for the distance_type = 'heavy'

My mistake, I've re-opened the issue,