ccsb-scripps / AutoDock-Vina

AutoDock Vina
http://vina.scripps.edu
Apache License 2.0
596 stars 209 forks source link

Torsional entropies for the Vina scoring function #240

Closed rwxayheee closed 1 year ago

rwxayheee commented 1 year ago

Hi Autodock dev team and users,

First of all massive thanks for taking the time to read this..

I am using Vina 1.2.5 for protein-ligand docking calculations including ligands with macrocycles, and I wonder if I could get your advice on some questions I have about the entropy penalties for active torsions –

Using the AD4 force field, the torsional entropy is 0.298*number of torsions and it is invariant with ligand conformations. In the Vina force field, there is a similar term returned as the 7th component from v.score(). This “torsional entropy” is included in the “total” score, and the latter is printed as “affinity” in the standard output and also as “VINA RESULT” in the pdbqt output.

My questions are generally about the definition of the torsional entropy: How was the torsional entropy derived for the Vina scoring function, and how is it currently calculated in vina 1.2.5? If anyone could explain or point me to the right references, I would really really appreciate it.

I am trying to run some calculations with Vina 1.2.5 to address the bias in some work I did last year with 1.2.3. I cannot post my ligand and receptor files at the moment.. but I will try to explain the specific questions using input files from the macrocycle docking example. The docking poses were generated from vina 1.2.5 in python. Then, I use v.score() to rescore and see the breakdown.

1. Is it expected for the torsional entropy to change with ligand conformation?

Here are the returned arrays of energies for a list of 10 poses. In both rigid- and flexible- macrocycle docking, the torsional entropy seems to be different from pose to pose.

Table 1. The example ligand, docked with rigid macrocycles

[-11.619 -18.751   0.      0.      0.     -0.682   7.132  -0.682]
[-10.988 -17.392   0.      0.      0.     -1.024   6.745  -0.682]
[-10.857 -17.517   0.      0.      0.     -0.686   6.664  -0.682]
[-10.53  -17.531   0.      0.      0.     -0.144   6.463  -0.682]
[ -8.193 -12.111   0.      0.      0.     -1.792   5.029  -0.682]
[ -7.855 -12.086   0.      0.      0.     -1.272   4.821  -0.682]
[ -7.827 -11.992   0.      0.      0.     -1.322   4.805  -0.682]
[ -7.601 -11.884   0.      0.      0.     -1.066   4.666  -0.682]
[ -7.583 -11.246   0.      0.      0.     -1.673   4.655  -0.682]

Table 2. The example ligand, docked with the glue atoms

[ -7.85  -17.717   0.      0.      0.     -2.188   9.867  -2.188]
[ -7.226 -15.888   0.      0.      0.     -2.608   9.082  -2.188]
[ -6.795 -15.705   0.      0.      0.     -1.819   8.541  -2.188]
[ -6.284 -13.971   0.      0.      0.     -2.399   7.898  -2.188]
[ -6.23  -13.577   0.      0.      0.     -2.671   7.83   -2.188]
[ -6.077 -15.833   0.      0.      0.     -0.07    7.638  -2.188]
[ -6.022 -13.668   0.      0.      0.     -2.112   7.57   -2.188]
[ -5.985 -14.324   0.      0.      0.     -1.371   7.522  -2.188]
[ -5.921 -13.478   0.      0.      0.     -2.073   7.442  -2.188]

2. Is it normal for glue atoms to contribute to the torsional entropy?

Table 3. The example ligand, docked with but scored without G atoms

[-11.276 -18.198   0.      0.      0.     -1.04    6.922  -1.04 ]
[-10.342 -16.273   0.      0.      0.     -1.457   6.348  -1.04 ]
[ -9.762 -16.02    0.      0.      0.     -0.774   5.992  -1.04 ]
[ -9.124 -14.139   0.      0.      0.     -1.625   5.601  -1.04 ]
[ -8.859 -13.717   0.      0.      0.     -1.619   5.438  -1.04 ]
[ -9.938 -16.342   0.      0.      0.     -0.735   6.1    -1.04 ]
[ -8.752 -13.846   0.      0.      0.     -1.318   5.372  -1.04 ]
[ -9.107 -14.36    0.      0.      0.     -1.377   5.59   -1.04 ]

The scores in Table 3 were computed on exactly the same poses in Table 2. The difference in torsional energies made me realize that the glue atoms also contributed. If so, it does makes sense because TORSDOF is generally underestimated when macrocycles are treated rigidly. Without their contribution, the total affinities become more consistent with Table 1 from docking with rigid macrocycles.

(I don't know enough about the example ligand. Which affinity aligns better with experimental values, -7.8 or -11.3 kcal/mol? )

3. This is not a problem for v.dock(), but when v.score() is used on existing poses, the torsional entropy also changes with the assigned unbound energy.

Table 4. The example ligand, docked with rigid macrocycles, but unbound energy assigned to 0 from the 2nd pose

[-11.619 -18.751   0.      0.      0.     -0.682   7.132  -0.682]
[-11.411 -17.392   0.      0.      0.     -1.024   7.004   0.   ]
[-11.279 -17.517   0.      0.      0.     -0.686   6.924   0.   ]
[-10.952 -17.531   0.      0.      0.     -0.144   6.723   0.   ]
[ -8.615 -12.111   0.      0.      0.     -1.792   5.288   0.   ]
[ -8.277 -12.086   0.      0.      0.     -1.272   5.081   0.   ]
[ -8.25  -11.992   0.      0.      0.     -1.322   5.064   0.   ]
[ -8.024 -11.884   0.      0.      0.     -1.066   4.925   0.   ]
[ -8.005 -11.246   0.      0.      0.     -1.673   4.914   0.   ]

The difference from Table 1 made me think that maybe the torsional entropy and an (assigned) unbound value are not mutually exclusive. But that is not a problem for v.dock().

I really hope I can get a better understanding of this. Any comments/suggestions are greatly appreciated.

Python script to print the list of scores:

from vina import Vina

def split_to_strings(filename):
    with open(filename,"r") as f:
        list_of_strings = []
        new_string = []

        line = f.readline()
        while line:
            header = line[:6]
            if header=="MODEL ":
                if len(new_string)>0:
                    list_of_strings.append(new_string)
                new_string = []
            else:
                if header!="ENDMDL":
                    new_string.append(line)
            line = f.readline()

    return list_of_strings

def print_scores(list_of_strings):
    unbound = None
    for pose in list_of_strings:
        v.set_ligand_from_string("".join(pose))
        if unbound is not None:
            energy = v.score(unbound_energy=unbound)
        else:
            energy = v.score()
            unbound = energy[-1]
        print(energy)

v = Vina()
v.set_receptor('BACE_1_receptor.pdbqt')
v.compute_vina_maps(center=[30.103, 6.152, 15.584], box_size=[20, 20, 20])

print("Table 1. rigid...")  
list_of_poses = split_to_strings("python_dock_rigid.pdbqt")
print_scores(list_of_poses)

print("Table 2. flex...")
list_of_poses = split_to_strings("python_dock_flex.pdbqt")
print_scores(list_of_poses)

print("Table 3. flex... but no glue atoms")
list_of_poses = split_to_strings("flex_transformed.pdbqt")
print_scores(list_of_poses)

Associated files: BACE_1_receptor.pdbqt BACE_1_receptor.pdbqt.txt

python_dock_rigid.pdbqt python_dock_rigid.pdbqt.txt

python_dock_flex.pdbqt python_dock_flex.pdbqt.txt

flex_transformed.pdbqt flex_transformed.pdbqt.txt

The last pdbqt file has coordinates from the flexible macrocycle docking & the layout from the rigid docking.

diogomart commented 1 year ago

Hi Amy,

First of all thank you for helping a lot of users here on GitHub and on the autodock mailing list!

How was the torsional entropy derived for the Vina scoring function, and how is it currently calculated in vina 1.2.5? It's defined in equation 9 in the original vina paper, and it's the same in v1.1.2 and v1.2.. The equation states that the final score is `c_inter / (1 + w n_rot), wherec_interare the intermolecular contributions,wis the weight, andn_rotis the number of rotatable bonds. Note that then_rotin vina is the number ofBRANCH` keywords in the ligand input file, not the TORSDOF at the end.

In reality, equation 9 is valid only for the first pose. The complete equation valid for any pose is

final_score = (c_inter_k + c_intra_k - c_intra_1) / (1 + w * n_rot)

where the _k and _1 suffixes denote the kth and 1st poses, respectively.

1. Is it expected for the torsional entropy to change with ligand conformation? Yes. Since the final score depends non-linearly on the number of rotatable bonds, we calculate the torsional entropy as:

torsional_entropy = final_score - (c_inter_k + c_intra_k - c_intra_1)

Here's the line of code where that happens.

2. Is it normal for glue atoms to contribute to the torsional entropy? The glue atoms don't contribute directly, but the BRANCH keywords (rotatable bonds) do, because n_rot in the equations above is the number of branch keywords.

Which affinity aligns better with experimental values I don't know. We would need to test different ways of counting the number of rotatable bonds within a macrocycle to be able to answer that. Rotatable bonds within a macrocycle often rotate less than "regular" rotatable bonds. The better torsional entropy estimate is probably somewhere between the rigid and the flexible models. See the discussion in #218

3. This is not a problem for v.dock(), but when v.score() is used on existing poses, the torsional entropy also changes with the assigned unbound energy The unbound energy is set to c_intra_1 in v.dock()

final_score = (c_inter_k + c_intra_k - c_intra_1) / (1 + w * n_rot)

when rescoring a pose other than the first with v.score(), c_intra_1 is unknown, so unbound is set to c_intra_k instead. Thus, the final_score will differ. This was already the case in v1.1.2. In your example, it seems you "manually" set unbound to zero for some of the poses, so the final_score will differ, and consequently the torsional_entropy differs too.

Happy to discuss this further!

rwxayheee commented 1 year ago

Hi @diogomart, thanks so much for your kind reply, you're a lifesaver!! For a very long time, I thought the values of torsional entropies were conformation-independent in both force fields.. I see that the g function that scales the final score by equation 9 is conformation-indepedent. (I used to think the g function scales the inter energies only, and the torsional entropy is added linearly & separately like in AD4.. )

Note that the n_rot in vina is the number of BRANCH keywords in the ligand input file, not the TORSDOF at the end. That explains the difference I see in from the two different "layouts" for rigid- and flexible macrocycle docking. So it's not that there's any contribution from glue atoms, but more like the final scores are scaled differently according to difference in the input files. I will be more careful with manipulating the inputs.

Thanks a lot for all the informative response.. That was really helpful.

rwxayheee commented 1 year ago

Hi @diogomart,

I just have one more question regarding how n_rot is counted from the input ligand pdbqt –

Given that equation 9 is true for the first pose: final_score = (c_inter_k + c_intra_1 - c_intra_1) / (1 + w * n_rot) = c_inter_1 / (1 + w * n_rot)

By rearranging the above equation, the actual n_rot can be calculated from the printed final_score and c_inter_1 that are returned by v.score(): n_rot = (c_inter_1 / final_score - 1) / w

I found that in some cases, the actual n_rot are nonintergers, and I wonder if by any chance some torsions (hydroxyl groups?) are considered 0.5. See following for one example:

MODEL 1
REMARK VINA RESULT:    -6.972      0.000      0.000
REMARK INTER + INTRA:          -6.560
REMARK INTER:                  -8.399
REMARK INTRA:                   1.839
REMARK UNBOUND:                 1.839
REMARK SMILES C[C@H]1C[C@H]2[C@@H]3C[C@H](F)C4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO
REMARK SMILES IDX 26 1 27 2 24 3 1 4 2 5 3 6 22 7 4 8 21 9 23 10 5 11 19 12
REMARK SMILES IDX 17 13 6 14 15 15 18 16 7 17 9 18 14 19 16 20 8 21 10 22
REMARK SMILES IDX 13 23 11 24 12 25 25 26 20 28 28 30 29 31
REMARK H PARENT 25 27 20 29 29 32
REMARK Flexibility Score: inf
ROOT
ATOM      1  C   UNL     1      18.373  14.245  27.072  1.00  0.00     0.190 C 
ATOM      2  O   UNL     1      17.913  13.704  28.049  1.00  0.00    -0.294 OA
ENDROOT
BRANCH   1   3
ATOM      3  C   UNL     1      19.863  14.445  26.954  1.00  0.00     0.133 C 
ATOM      4  C   UNL     1      19.314  14.782  29.393  1.00  0.00     0.011 C 
ATOM      5  C   UNL     1      20.431  14.805  28.348  1.00  0.00     0.026 C 
ATOM      6  C   UNL     1      21.495  13.724  28.681  1.00  0.00     0.011 C 
ATOM      7  C   UNL     1      20.543  13.134  26.569  1.00  0.00     0.010 C 
ATOM      8  C   UNL     1      21.911  13.214  27.297  1.00  0.00     0.005 C 
ATOM      9  C   UNL     1      20.812  12.882  25.084  1.00  0.00     0.045 C 
ATOM     10  C   UNL     1      19.784  11.947  27.166  1.00  0.00     0.015 C 
ATOM     11  C   UNL     1      22.538  11.832  27.264  1.00  0.00     0.037 C 
ATOM     12  C   UNL     1      21.535  11.533  24.944  1.00  0.00     0.154 C 
ATOM     13  C   UNL     1      22.803  11.482  25.791  1.00  0.00     0.152 C 
ATOM     14  C   UNL     1      23.848  11.837  28.053  1.00  0.00     0.049 C 
ATOM     15  C   UNL     1      23.407  10.078  25.715  1.00  0.00     0.047 C 
ATOM     16  F   UNL     1      23.717  12.405  25.273  1.00  0.00    -0.239 F 
ATOM     17  C   UNL     1      24.458  10.432  28.051  1.00  0.00     0.194 C 
ATOM     18  C   UNL     1      24.608   9.985  26.609  1.00  0.00    -0.020 C 
ATOM     19  C   UNL     1      23.678   9.694  24.300  1.00  0.00    -0.009 C 
ATOM     20  C   UNL     1      22.343   9.124  26.262  1.00  0.00     0.024 C 
ATOM     21  F   UNL     1      25.711  10.460  28.671  1.00  0.00    -0.243 F 
ATOM     22  C   UNL     1      25.784   9.533  26.196  1.00  0.00     0.055 C 
ATOM     23  C   UNL     1      24.845   9.247  23.864  1.00  0.00     0.048 C 
ATOM     24  C   UNL     1      25.975   9.110  24.799  1.00  0.00     0.178 C 
ATOM     25  O   UNL     1      27.041   8.657  24.425  1.00  0.00    -0.290 OA
BRANCH   3  26
ATOM     26  O   UNL     1      20.151  15.455  25.985  1.00  0.00    -0.381 OA
ATOM     27  H   UNL     1      19.366  15.871  25.604  1.00  0.00     0.212 HD
ENDBRANCH   3  26
BRANCH  12  28
ATOM     28  O   UNL     1      20.653  10.488  25.361  1.00  0.00    -0.390 OA
ATOM     29  H   UNL     1      19.882  10.798  25.857  1.00  0.00     0.211 HD
ENDBRANCH  12  28
ENDBRANCH   1   3
BRANCH   1  30
ATOM     30  C   UNL     1      17.459  14.721  25.973  1.00  0.00     0.234 C 
BRANCH  30  31
ATOM     31  O   UNL     1      17.084  13.613  25.152  1.00  0.00    -0.388 OA
ATOM     32  H   UNL     1      17.139  12.757  25.597  1.00  0.00     0.211 HD
ENDBRANCH  30  31
ENDBRANCH   1  30
TORSDOF 5

ENDMDL

The ligand is a known drug (Psorcon): https://zinc20.docking.org/substances/ZINC000005752191/ The input pdbqt has 5 branches, but the actual n_rot is 3.5. There happens to be 3 OH groups.

Is this true for all R-OH, R-NH2 and R-NH3+? Also, are there also exceptions for phosphorus or sulfur containing groups like phosphonates or sulfone?

We are pursuing some macrocycle-containing drug candidates, but in the past I very naively compared them directly with those that have only rigid cycles.. The scaling by n_rot could actually make quite a big difference. So I'm trying to make the optimistic estimate (scored as if rigid) and the pessimistic estimate (scored as if flexible) for the macrocycle-containing ligands. And I want to get a better understanding of how n_rot is determined.

Thanks so much for your time and kind advice in advance!!

diogomart commented 1 year ago

I never realized that Vina halves the contribution of some rotatable bonds. Thank you for bringing this up!

Looking at the code, I think you are right, any rotatable bond that rotates only hydrogens, such as R-OH, R-NH2, and R-NH3, gets its contribution halved. I'm not seeing any other exception. I'll take a closer look and confirm.

rwxayheee commented 1 year ago

Hi @diogomart, thanks again for taking the time to look into this. Sorry for making the confusing statements. I browsed through ligands that contain typical tetrahedral phosphorus or sulfur groups, and I don't think they are handled differently from other rotatable single bonds.

The only exception I found was with hypervalent phosphorus. But they are not common in pharmaceuticals, please don't feel worried about it.. I can take a closer look at tree counting code when I get a chance.

Thanks a lot for the useful discussion!