ccsb-scripps / AutoDock-Vina

AutoDock Vina
http://vina.scripps.edu
Apache License 2.0
617 stars 210 forks source link

Questions about calculating intramolecular energy for a ligand #349

Closed tmcgrath325 closed 1 month ago

tmcgrath325 commented 1 month ago

Hi Autodock dev team and users,

Thanks for building and maintaining this fantastic tool, and thanks for taking the time to read this.

To make sure that I understand the Vina scoring function, I'm putting together a quick and dirty reimplementation of it based on the original paper to see if I can recalculate scores for particular poses. It seems like I'm able to match the intermolecular component of the score, but I'm getting a little tripped up by the intramolecular part.

Assuming that I'm correctly assigning atom types and calculating potentials appropriately for atom pairs, my guess is that my issues might be coming from choosing incorrect sets of atom pairs to be evaluated for the intramolecular score. Based on the paper and reading through some of the source code, my understanding of the choice of atom pairs is as follows:

For these atom pairs, potentials are calculated in the same was as ligand-protein ones.

Is that approach correct? When using atom pairs chosen in this way, I'm consistently getting getting higher scores for individual components of the scoring function, which might indicate that I'm keeping some pairs that should be excluded.

Here's a particular ligand I'm using for testing...

REMARK SMILES Cc1cc(C)cc(NC(=O)Cc2ccc(OC(C)(C)C(=O)O)cc2)c1
REMARK SMILES IDX 15 1 14 2 13 3 12 4 23 5 24 6 16 7 17 8 18 9 19 10 20 11
REMARK SMILES IDX 21 12 22 13 11 15 9 16 10 17 8 18 7 20 25 21 6 22 2 23 4 24
REMARK SMILES IDX 3 25 1 26 5 27
REMARK H PARENT 22 14 8 19
REMARK Flexibility Score: inf
ROOT
ATOM      1  C   UNL     1      19.351  16.715  41.509  1.00  0.00     0.121 A 
ATOM      2  C   UNL     1      19.317  17.904  40.827  1.00  0.00     0.047 A 
ATOM      3  C   UNL     1      18.398  18.886  41.203  1.00  0.00     0.008 A 
ATOM      4  C   UNL     1      17.506  18.721  42.240  1.00  0.00    -0.038 A 
ATOM      5  C   UNL     1      18.447  16.520  42.574  1.00  0.00     0.047 A 
ATOM      6  C   UNL     1      17.562  17.513  42.908  1.00  0.00     0.008 A 
ENDROOT
BRANCH   1   7
ATOM      7  O   UNL     1      20.240  15.735  41.171  1.00  0.00    -0.476 OA
BRANCH   7   8
ATOM      8  C   UNL     1      19.996  14.661  40.348  1.00  0.00     0.199 C 
ATOM      9  C   UNL     1      18.621  14.881  39.689  1.00  0.00     0.069 C 
ATOM     10  C   UNL     1      21.020  14.657  39.201  1.00  0.00     0.069 C 
BRANCH   8  11
ATOM     11  C   UNL     1      20.049  13.330  40.949  1.00  0.00     0.347 C 
ATOM     12  O   UNL     1      19.365  13.003  41.923  1.00  0.00    -0.248 OA
BRANCH  11  13
ATOM     13  O   UNL     1      20.914  12.341  40.419  1.00  0.00    -0.478 OA
ATOM     14  H   UNL     1      21.360  11.750  41.094  1.00  0.00     0.296 HD
ENDBRANCH  11  13
ENDBRANCH   8  11
ENDBRANCH   7   8
ENDBRANCH   1   7
BRANCH   4  15
ATOM     15  C   UNL     1      16.547  19.775  42.615  1.00  0.00     0.135 C 
BRANCH  15  16
ATOM     16  C   UNL     1      17.288  20.981  43.137  1.00  0.00     0.228 C 
ATOM     17  O   UNL     1      17.890  20.865  44.201  1.00  0.00    -0.274 OA
ATOM     18  N   UNL     1      17.260  22.164  42.400  1.00  0.00    -0.326 N 
ATOM     19  H   UNL     1      16.709  22.174  41.481  1.00  0.00     0.169 HD
BRANCH  18  20
ATOM     20  C   UNL     1      17.915  23.362  42.781  1.00  0.00     0.041 A 
ATOM     21  C   UNL     1      18.874  24.019  42.066  1.00  0.00     0.029 A 
ATOM     22  C   UNL     1      17.575  23.963  44.003  1.00  0.00     0.029 A 
ATOM     23  C   UNL     1      19.487  25.197  42.488  1.00  0.00    -0.049 A 
ATOM     24  C   UNL     1      18.134  25.113  44.477  1.00  0.00    -0.049 A 
ATOM     25  C   UNL     1      19.103  25.730  43.698  1.00  0.00     0.007 A 
ATOM     26  C   UNL     1      20.525  25.863  41.660  1.00  0.00     0.044 C 
ATOM     27  C   UNL     1      17.740  25.719  45.787  1.00  0.00     0.044 C 
ENDBRANCH  18  20
ENDBRANCH  15  16
ENDBRANCH   4  15
TORSDOF 7

... for which I generated the following table of atom pairs, where each row represents all of the other atoms that interact with a particular atom:

 # Atoms interacting with...                                Atom Number
 [12, 13, 15, 16, 17, 18, 20, 21, 22]                       # 1
 [9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 22, 23]            # 2
 [8, 9, 10, 11, 12, 13, 17, 18, 20, 21, 22, 23, 24, 25, 26] # 3
 [7, 8, 9, 10, 11, 12, 13, 20, 21, 22, 23, 24, 25, 26, 27]  # 4
 [9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 22]                # 5
 [8, 9, 10, 11, 12, 13, 17, 18, 20, 21, 22, 23, 24]         # 6
 [15, 16, 17, 18]                                           # 7
 [15, 16, 17]                                               # 8
 [15, 16, 17, 18]                                           # 9
 [15]                                                       # 10
 [15]                                                       # 11
 [15]                                                       # 12
 []                                                         # 13
 [21, 22, 23, 24, 25, 26, 27]                               # 15
 [23, 24, 25, 26, 27]                                       # 16
 [21, 22, 23, 24, 25, 26, 27]                               # 17
 [25, 26, 27]                                               # 18
 []                                                         # 20
 [27]                                                       # 21
 [26]                                                       # 22
 []                                                         # 23
 []                                                         # 24
 []                                                         # 25
 [27]                                                       # 26
 []                                                         # 27

Here's a plot of the ligand with bonds, for reference: testligand

Do I have the right idea?

While it's not central to my question about atom pairs, I've included below what I've been using to check my understanding of the scoring against Vina itself:

Code snippet used to generate c_intra for each part of the scoring function

scorer = Vina(sf_name="vina")
scorer.set_receptor(protein_path)

results = []
for i in range(5):
    weights = tuple([j == i for j in range(7)])
    scorer.set_weights(weights)

    scorer.compute_vina_maps(
        center=[16.966, 16.623, 41.744], 
        box_size=[25, 25, 25]
    )

    scorer.set_ligand_from_file(mol_path)
    results.append(scorer.score())

with open(astex_dir + 'result_components.txt', 'w') as f:
    for res in results:
        f.write(str(res[5]) + '\n')

Output of the above, where each line is an unweighted component of the ligand's intra score (in the standard order):

10.586
67.753
0.703
7.491
0.0

Python script for recalculating components of c_intra

import math

# constants

VDW_RADII = {
    'CH':  1.9,
    'CP':  1.9,
    'O':   1.7,
    'OD':  1.7,
    'OA':  1.7,
    'ODA': 1.7,
    'N':   1.8,
    'ND':  1.8,
    'NA':  1.8,
    'NDA': 1.8,
}

# functions

def clamp(n, lo, hi):
    return min(hi,max(n,lo))

def dist(coords1, coords2):
    return math.sqrt(sum([(x - y)**2 for x, y in zip(coords1, coords2)]))

def optimaldist(atomtype1, atomtype2):
    return VDW_RADII.get(atomtype1) + VDW_RADII.get(atomtype2)

def ishydrophobe(atomtype):
    return atomtype == "CH" # ignoring other atomtypes that aren't present in the test ligand

def isacceptor(atomtype):
    return atomtype == "OA" or atomtype == "ODA" or atomtype == "NA" or atomtype == "NDA"

def isdonor(atomtype):
    return atomtype == "OD" or atomtype == "ODA" or atomtype == "ND" or atomtype == "NDA"

def gauss(coords1, type1, coords2, type2, offset, width):
    return math.exp(-(((dist(coords1, coords2) - optimaldist(type1, type2) - offset)/width))**2)

def gauss1(coords1, type1, coords2, type2):
    return gauss(coords1, type1, coords2, type2, 0, 0.5)

def gauss2(coords1, type1, coords2, type2):
    return gauss(coords1, type1, coords2, type2, 3, 2)

def repulsion(coords1, type1, coords2, type2):
    return min(dist(coords1,coords2) - optimaldist(type1, type2), 0)**2

def ramp(val, bad, good):
    lo = min(bad,good)
    hi = max(bad,good)
    return (clamp(val,lo,hi) - bad) / (good - bad)

def hydrophobic(coords1, type1, coords2, type2):
    if ishydrophobe(type1) and ishydrophobe(type2):
        return ramp(dist(coords1,coords2) - optimaldist(type1,type2), 1.5, 0.5)
    else:
        return 0

def hbonding(coords1, type1, coords2, type2):
    if ((isacceptor(type1) and isdonor(type2)) or (isacceptor(type2) and isdonor(type1))):
        return ramp(dist(coords1,coords2) - optimaldist(type1,type2), 0, -0.7)
    else:
        return 0

# data from particular ligand

coordinates = [
    [19.351, 16.715, 41.509], # 1
    [19.317, 17.904, 40.827], # 2
    [18.398, 18.886, 41.203], # 3
    [17.506, 18.721, 42.24 ], # 4
    [18.447, 16.52,  42.574], # 5
    [17.562, 17.513, 42.908], # 6
    [20.24,  15.735, 41.171], # 7
    [19.996, 14.661, 40.348], # 8
    [18.621, 14.881, 39.689], # 9
    [21.02,  14.657, 39.201], # 10
    [20.049, 13.33,  40.949], # 11
    [19.365, 13.003, 41.923], # 12
    [20.914, 12.341, 40.419], # 13
    [16.547, 19.775, 42.615], # 15
    [17.288, 20.981, 43.137], # 16
    [17.89,  20.865, 44.201], # 17
    [17.26,  22.164, 42.4  ], # 18
    [17.915, 23.362, 42.781], # 20
    [18.874, 24.019, 42.066], # 21
    [17.575, 23.963, 44.003], # 22
    [19.487, 25.197, 42.488], # 23 
    [18.134, 25.113, 44.477], # 24
    [19.103, 25.73,  43.698], # 25
    [20.525, 25.863, 41.66 ], # 26
    [17.74,  25.719, 45.787], # 27
]

atomtypes = [
    "CP",  "CH",  "CH",  "CH",  "CH", # 1,  2,  3,  4,  5
    "CH",  "OA",  "CP",  "CH",  "CH", # 6,  7,  8,  9,  10
    "CP",  "OA",  "ODA", "CH",  "CP", # 11, 12, 13, 15, 16
    "OA",  "ND",  "CP",  "CH",  "CH", # 17, 18, 20, 21, 22
    "CH",  "CH",  "CH",  "CH",  "CH", # 23, 24, 25, 26, 27
]

# note that these are indices for the above lists, NOT the serials of the atoms
intra_pairs = [ # excluding atoms within 3 covalent bonds of one another or greater than 8 angstroms away
    [12, 13, 14, 15, 16, 17, 18, 19, 20],
    [9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21],
    [8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22, 23, 24],
    [7, 8, 9, 10, 11, 12, 13, 18, 19, 20, 21, 22, 23, 24, 25],
    [9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20],
    [8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22],
    [14, 15, 16, 17],
    [14, 15, 16],
    [14, 15, 16, 17],
    [14],
    [14],
    [14],
    [],
    [19, 20, 21, 22, 23, 24, 25],
    [21, 22, 23, 24, 25],
    [19, 20, 21, 22, 23, 24, 25],
    [23, 24, 25],
    [],
    [25],
    [24],
    [],
    [],
    [],
    [25],
    [],
]

def evaluate_potential_function(coordinates, atomtypes, allpairs, func):
    potential = 0
    for coords, atype, pairs in zip(coordinates, atomtypes, allpairs):
        for pairidx in pairs:
            potential = potential + func(coords, atype, coordinates[pairidx-1], atomtypes[pairidx-1])
    return potential

with open('recalculations.txt', 'w') as f:
    for func in [gauss1, gauss2, repulsion, hydrophobic, hbonding]:
        f.write(str(evaluate_potential_function(coordinates, atomtypes, intra_pairs, func)) + '\n')

Output of the above script, where the entries are the recalculated scores:

12.516457453275775
71.02000105089027
0.7046835040141604
9.814623380276732
0.0

Aside from the hydrogen bonding score of zero, the only score that seems to be roughly correct is the repulsion component.

diogomart commented 1 month ago

There are further exclusions to the non-bonded list: vina excludes any pair of atoms for which the distance doesn't change. This includes any pair within a rigid group of atoms. for example, the following pairs in your example above would be excluded:

26-27
18-16
18-27
21-27

(these are just a few examples of pairs that vina excludes, there are more.

tmcgrath325 commented 1 month ago

That did the trick, thanks @diogomart! 🎉 It makes a lot of sense that anything invariant to bond rotations is excluded from the score.

diogomart commented 1 month ago

You are welcome!