Closed rwxayheee closed 1 year ago
Hi Amy,
First of all thank you for helping a lot of users here on GitHub and on the autodock mailing list!
How was the torsional entropy derived for the Vina scoring function, and how is it currently calculated in vina 1.2.5?
It's defined in equation 9 in the original vina paper, and it's the same in v1.1.2 and v1.2.. The equation states that the final score is `c_inter / (1 + w n_rot), where
c_interare the intermolecular contributions,
wis the weight, and
n_rotis the number of rotatable bonds. Note that the
n_rotin vina is the number of
BRANCH` keywords in the ligand input file, not the TORSDOF at the end.
In reality, equation 9 is valid only for the first pose. The complete equation valid for any pose is
final_score = (c_inter_k + c_intra_k - c_intra_1) / (1 + w * n_rot)
where the _k
and _1
suffixes denote the k
th and 1st poses, respectively.
1. Is it expected for the torsional entropy to change with ligand conformation? Yes. Since the final score depends non-linearly on the number of rotatable bonds, we calculate the torsional entropy as:
torsional_entropy = final_score - (c_inter_k + c_intra_k - c_intra_1)
Here's the line of code where that happens.
2. Is it normal for glue atoms to contribute to the torsional entropy?
The glue atoms don't contribute directly, but the BRANCH keywords (rotatable bonds) do, because n_rot
in the equations above is the number of branch keywords.
Which affinity aligns better with experimental values I don't know. We would need to test different ways of counting the number of rotatable bonds within a macrocycle to be able to answer that. Rotatable bonds within a macrocycle often rotate less than "regular" rotatable bonds. The better torsional entropy estimate is probably somewhere between the rigid and the flexible models. See the discussion in #218
3. This is not a problem for v.dock(), but when v.score() is used on existing poses, the torsional entropy also changes with the assigned unbound energy
The unbound energy is set to c_intra_1
in v.dock()
final_score = (c_inter_k + c_intra_k - c_intra_1) / (1 + w * n_rot)
when rescoring a pose other than the first with v.score()
, c_intra_1
is unknown, so unbound is set to c_intra_k
instead. Thus, the final_score
will differ. This was already the case in v1.1.2. In your example, it seems you "manually" set unbound to zero for some of the poses, so the final_score
will differ, and consequently the torsional_entropy
differs too.
Happy to discuss this further!
Hi @diogomart, thanks so much for your kind reply, you're a lifesaver!! For a very long time, I thought the values of torsional entropies were conformation-independent in both force fields.. I see that the g function that scales the final score by equation 9 is conformation-indepedent. (I used to think the g function scales the inter energies only, and the torsional entropy is added linearly & separately like in AD4.. )
Note that the n_rot in vina is the number of BRANCH keywords in the ligand input file, not the TORSDOF at the end.
That explains the difference I see in from the two different "layouts" for rigid- and flexible macrocycle docking. So it's not that there's any contribution from glue atoms, but more like the final scores are scaled differently according to difference in the input files. I will be more careful with manipulating the inputs.
Thanks a lot for all the informative response.. That was really helpful.
Hi @diogomart,
I just have one more question regarding how n_rot
is counted from the input ligand pdbqt –
Given that equation 9 is true for the first pose:
final_score = (c_inter_k + c_intra_1 - c_intra_1) / (1 + w * n_rot) = c_inter_1 / (1 + w * n_rot)
By rearranging the above equation, the actual n_rot
can be calculated from the printed final_score
and c_inter_1
that are returned by v.score()
:
n_rot = (c_inter_1 / final_score - 1) / w
I found that in some cases, the actual n_rot
are nonintergers, and I wonder if by any chance some torsions (hydroxyl groups?) are considered 0.5. See following for one example:
MODEL 1
REMARK VINA RESULT: -6.972 0.000 0.000
REMARK INTER + INTRA: -6.560
REMARK INTER: -8.399
REMARK INTRA: 1.839
REMARK UNBOUND: 1.839
REMARK SMILES C[C@H]1C[C@H]2[C@@H]3C[C@H](F)C4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO
REMARK SMILES IDX 26 1 27 2 24 3 1 4 2 5 3 6 22 7 4 8 21 9 23 10 5 11 19 12
REMARK SMILES IDX 17 13 6 14 15 15 18 16 7 17 9 18 14 19 16 20 8 21 10 22
REMARK SMILES IDX 13 23 11 24 12 25 25 26 20 28 28 30 29 31
REMARK H PARENT 25 27 20 29 29 32
REMARK Flexibility Score: inf
ROOT
ATOM 1 C UNL 1 18.373 14.245 27.072 1.00 0.00 0.190 C
ATOM 2 O UNL 1 17.913 13.704 28.049 1.00 0.00 -0.294 OA
ENDROOT
BRANCH 1 3
ATOM 3 C UNL 1 19.863 14.445 26.954 1.00 0.00 0.133 C
ATOM 4 C UNL 1 19.314 14.782 29.393 1.00 0.00 0.011 C
ATOM 5 C UNL 1 20.431 14.805 28.348 1.00 0.00 0.026 C
ATOM 6 C UNL 1 21.495 13.724 28.681 1.00 0.00 0.011 C
ATOM 7 C UNL 1 20.543 13.134 26.569 1.00 0.00 0.010 C
ATOM 8 C UNL 1 21.911 13.214 27.297 1.00 0.00 0.005 C
ATOM 9 C UNL 1 20.812 12.882 25.084 1.00 0.00 0.045 C
ATOM 10 C UNL 1 19.784 11.947 27.166 1.00 0.00 0.015 C
ATOM 11 C UNL 1 22.538 11.832 27.264 1.00 0.00 0.037 C
ATOM 12 C UNL 1 21.535 11.533 24.944 1.00 0.00 0.154 C
ATOM 13 C UNL 1 22.803 11.482 25.791 1.00 0.00 0.152 C
ATOM 14 C UNL 1 23.848 11.837 28.053 1.00 0.00 0.049 C
ATOM 15 C UNL 1 23.407 10.078 25.715 1.00 0.00 0.047 C
ATOM 16 F UNL 1 23.717 12.405 25.273 1.00 0.00 -0.239 F
ATOM 17 C UNL 1 24.458 10.432 28.051 1.00 0.00 0.194 C
ATOM 18 C UNL 1 24.608 9.985 26.609 1.00 0.00 -0.020 C
ATOM 19 C UNL 1 23.678 9.694 24.300 1.00 0.00 -0.009 C
ATOM 20 C UNL 1 22.343 9.124 26.262 1.00 0.00 0.024 C
ATOM 21 F UNL 1 25.711 10.460 28.671 1.00 0.00 -0.243 F
ATOM 22 C UNL 1 25.784 9.533 26.196 1.00 0.00 0.055 C
ATOM 23 C UNL 1 24.845 9.247 23.864 1.00 0.00 0.048 C
ATOM 24 C UNL 1 25.975 9.110 24.799 1.00 0.00 0.178 C
ATOM 25 O UNL 1 27.041 8.657 24.425 1.00 0.00 -0.290 OA
BRANCH 3 26
ATOM 26 O UNL 1 20.151 15.455 25.985 1.00 0.00 -0.381 OA
ATOM 27 H UNL 1 19.366 15.871 25.604 1.00 0.00 0.212 HD
ENDBRANCH 3 26
BRANCH 12 28
ATOM 28 O UNL 1 20.653 10.488 25.361 1.00 0.00 -0.390 OA
ATOM 29 H UNL 1 19.882 10.798 25.857 1.00 0.00 0.211 HD
ENDBRANCH 12 28
ENDBRANCH 1 3
BRANCH 1 30
ATOM 30 C UNL 1 17.459 14.721 25.973 1.00 0.00 0.234 C
BRANCH 30 31
ATOM 31 O UNL 1 17.084 13.613 25.152 1.00 0.00 -0.388 OA
ATOM 32 H UNL 1 17.139 12.757 25.597 1.00 0.00 0.211 HD
ENDBRANCH 30 31
ENDBRANCH 1 30
TORSDOF 5
ENDMDL
The ligand is a known drug (Psorcon): https://zinc20.docking.org/substances/ZINC000005752191/
The input pdbqt has 5 branches, but the actual n_rot
is 3.5. There happens to be 3 OH groups.
Is this true for all R-OH, R-NH2 and R-NH3+? Also, are there also exceptions for phosphorus or sulfur containing groups like phosphonates or sulfone?
We are pursuing some macrocycle-containing drug candidates, but in the past I very naively compared them directly with those that have only rigid cycles.. The scaling by n_rot
could actually make quite a big difference. So I'm trying to make the optimistic estimate (scored as if rigid) and the pessimistic estimate (scored as if flexible) for the macrocycle-containing ligands. And I want to get a better understanding of how n_rot
is determined.
Thanks so much for your time and kind advice in advance!!
I never realized that Vina halves the contribution of some rotatable bonds. Thank you for bringing this up!
Looking at the code, I think you are right, any rotatable bond that rotates only hydrogens, such as R-OH, R-NH2, and R-NH3, gets its contribution halved. I'm not seeing any other exception. I'll take a closer look and confirm.
Hi @diogomart, thanks again for taking the time to look into this. Sorry for making the confusing statements. I browsed through ligands that contain typical tetrahedral phosphorus or sulfur groups, and I don't think they are handled differently from other rotatable single bonds.
The only exception I found was with hypervalent phosphorus. But they are not common in pharmaceuticals, please don't feel worried about it.. I can take a closer look at tree counting code when I get a chance.
Thanks a lot for the useful discussion!
Hi Autodock dev team and users,
First of all massive thanks for taking the time to read this..
I am using Vina 1.2.5 for protein-ligand docking calculations including ligands with macrocycles, and I wonder if I could get your advice on some questions I have about the entropy penalties for active torsions –
Using the AD4 force field, the torsional entropy is 0.298*number of torsions and it is invariant with ligand conformations. In the Vina force field, there is a similar term returned as the 7th component from
v.score()
. This “torsional entropy” is included in the “total” score, and the latter is printed as “affinity” in the standard output and also as “VINA RESULT” in the pdbqt output.My questions are generally about the definition of the torsional entropy: How was the torsional entropy derived for the Vina scoring function, and how is it currently calculated in vina 1.2.5? If anyone could explain or point me to the right references, I would really really appreciate it.
I am trying to run some calculations with Vina 1.2.5 to address the bias in some work I did last year with 1.2.3. I cannot post my ligand and receptor files at the moment.. but I will try to explain the specific questions using input files from the macrocycle docking example. The docking poses were generated from vina 1.2.5 in python. Then, I use
v.score()
to rescore and see the breakdown.1. Is it expected for the torsional entropy to change with ligand conformation?
Here are the returned arrays of energies for a list of 10 poses. In both rigid- and flexible- macrocycle docking, the torsional entropy seems to be different from pose to pose.
Table 1. The example ligand, docked with rigid macrocycles
Table 2. The example ligand, docked with the glue atoms
2. Is it normal for glue atoms to contribute to the torsional entropy?
Table 3. The example ligand, docked with but scored without G atoms
The scores in Table 3 were computed on exactly the same poses in Table 2. The difference in torsional energies made me realize that the glue atoms also contributed. If so, it does makes sense because TORSDOF is generally underestimated when macrocycles are treated rigidly. Without their contribution, the total affinities become more consistent with Table 1 from docking with rigid macrocycles.
(I don't know enough about the example ligand. Which affinity aligns better with experimental values, -7.8 or -11.3 kcal/mol? )
3. This is not a problem for v.dock(), but when v.score() is used on existing poses, the torsional entropy also changes with the assigned unbound energy.
Table 4. The example ligand, docked with rigid macrocycles, but unbound energy assigned to 0 from the 2nd pose
The difference from Table 1 made me think that maybe the torsional entropy and an (assigned) unbound value are not mutually exclusive. But that is not a problem for v.dock().
I really hope I can get a better understanding of this. Any comments/suggestions are greatly appreciated.
Python script to print the list of scores:
Associated files: BACE_1_receptor.pdbqt BACE_1_receptor.pdbqt.txt
python_dock_rigid.pdbqt python_dock_rigid.pdbqt.txt
python_dock_flex.pdbqt python_dock_flex.pdbqt.txt
flex_transformed.pdbqt flex_transformed.pdbqt.txt
The last pdbqt file has coordinates from the flexible macrocycle docking & the layout from the rigid docking.