Valdes-Tresanco-MS / gmx_MMPBSA

gmx_MMPBSA is a new tool based on AMBER's MMPBSA.py aiming to perform end-state free energy calculations with GROMACS files.
https://valdes-tresanco-ms.github.io/gmx_MMPBSA/
GNU General Public License v3.0
226 stars 65 forks source link

Error about ligand parameterization using ACPYPE (I guess...) #42

Closed EdwardMendez95 closed 3 years ago

EdwardMendez95 commented 3 years ago

Greetings,

I'm working with my own Protein-Ligand system. Ligand was parameterized using ACPYPE (ACPYPE - AnteChamber PYthon Parser interfacE) [Da Silva, A. W. S., & Vranken, W. F. (2012). ACPYPE-Antechamber python parser interface. BMC research notes, 5(1), 1-8.].

Running a gmx_MMPBSA calculation using:

_gmx_MMPBSA -O -i mmpbsa.in -cs md_0_10.tpr -ci index.ndx -cg 1 13 -ct md_0_10center.xtc -lm oxml.mol2 -eo output

I got this error:

[INFO ] Loading and checking parameter files for compatibility...

_Preparing trajectories for simulation... Error: # atoms in XTC file (14027) does not match # atoms in parm COM.prmtop (14020) Error: Could not set up 'COM_traj_0.xtc' for reading. Error: Could not set up input trajectory 'COM_traj_0.xtc'. Error: Error(s) occurred during execution. file "/home/XXX/anaconda3/envs/AmberTools20/bin/gmxMMPBSA", line 8, in

Inspectioning of gmx_MMPBSA.log I found this error:

Loading Mol2 file: ./oxml.mol2 Reading MOLECULE named MOL Checking 'LIG1'.... Checking parameters for unit 'LIG1'. Checking for bond parameters. Checking for angle parameters.

/home/XXXX/anaconda3/envs/AmberTools20/bin/teLeap: Error! Could not find angle parameter: n2 - ce - oh

/home/XXXX/anaconda3/envs/AmberTools20/bin/teLeap: Warning! There are missing parameters. Unit is OK.

Checking Leap.log I found that only ff99SB (Hornak & Simmerling) and gaff force fields are invoked (without taking account ions, solvent and nucleid acid). Ie:

Loading parameters: /home/XXXX/anaconda3/envs/AmberTools20/dat/leap/parm/gaff.dat

Inspecting gaff.dat, I was able to find that, certainly, that angle parameter does not exist (nevertheless I was able to generate topology of ligand and that topology is being used as oxml.mol2 in gmx_MMPBSA command). But checking ACPYPE source code I found:

 parser.add_argument(
    "-a",
    "--atom_type",
    choices=["gaff", "amber", "gaff2", "amber2"],
    action="store",
    default="gaff",
    dest="atom_type",
    help="atom type, can be gaff, gaff2, amber (AMBER14SB) or amber2 (AMBER14SB + GAFF2), default is gaff",
)

So apparently, more ff are being used to parameterized molecules and that's why I was able to create the angle parameter. Indeed according to Leap.log file generated by ACPYPE:

Building improper torsion parameters. --Impropers: 1 C14 - C16 - C15 - C20 1 C4 - C9 - C8 - C13 1 H11 - C8 - C13 - C12 1 H12 - C11 - C12 - C13 1 H13 - C18 - C19 - C20 1 H19 - C16 - C17 - C18 1 H1 - C8 - C9 - C10 1 H21 - C10 - C11 - C12 1 H22 - C17 - C18 - C19 1 H24 - C15 - C20 - C19 1 H3 - C9 - C10 - C11 1 H6 - C15 - C16 - C17 1 N1 - C15 - C14 - O4 total 13 improper torsions applied Building H-Bond parameters.

The parameters were built using the ff before mentioned.

Since ACPYPE is commonly used with GROMACS to generate topologies in a reliable way. There is a posibility to include these topologies as default in tleap command of gmx_MMPSA, is not? Or maybe the option is to modify manually gaff.dat to include that angle (but with the issue that gaff2 and other ff are not being invoked by gmx_MMPBSA and there is a posibility of new conflicts, atleast, according to my gmx_MMPBSA.log).

Thank you

Psdta: I attatched my gmx_MMPBSA.log and I want to clarify I was able to used gmx_MMPBSA in other Protein-Ligans ST system without problem, ando comparing both .log the only difference is the ligand topology that is given by mol2 file (certainly, the ligands are different and that's why I tought the problem is what I wrote above, since the ligand without problems does not have this n2 - ce - oh feature). Sorry for the long post. Have a nice day.

gmx_MMPBSA.log

Valdes-Tresanco-MS commented 3 years ago

Thanks for this well-explained report @EdwardMendez95

Let's go through the steps for a better explanation.

The first error you get is related to the complex topology and the trajectory. Since I do not have the structure of the complex I cannot 100% guarantee that it is explicitly related to the topology of the ligand. If you wish, you can send us the files (top, tpr, xtc [10 frames], ndx, ligand mol2, and groups) to review them.

/home/XXXX/anaconda3/envs/AmberTools20/bin/teLeap: Error! Could not find angle parameter: n2 - ce - oh

/home/XXXX/anaconda3/envs/AmberTools20/bin/teLeap: Warning! There are missing parameters. Unit is OK.

This error must be solved when the parameter file frcmod is passed in the next line, so tleap is able to finally generate the ligand topology without problems

Loading parameters: ./_GMXMMPBSA_oxml.frcmod

The "error" is shown since the check command of tleap is placed before reading the frcmod file and in turn, the frcmod command line should be placed before reading the ligand (definitely, an inconsistency that we will solve asap). This file is generated by parmchck2 who detects if there are unassigned bonds, angles, dihedral, or improper. However, I cannot fully assure you that this is the cause of the problem until I review the files I mentioned since the current way works too.

Since ACPYPE is commonly used with GROMACS to generate topologies in a reliable way. There is a posibility to include these topologies as default in tleap command of gmx_MMPSA, is not? Or maybe the option is to modify manually gaff.dat to include that angle (but with the issue that gaff2 and other ff are not being invoked by gmx_MMPBSA and there is a posibility of new conflicts, atleast, according to my gmx_MMPBSA.log).

In fact, gmx_MMPBSA does support the force fields you mention. We have recently changed the forcefields variable to accept several different force fields, in order to process multicomponent systems. However, from your observation, I noticed that gmx_MMPBSA incorrectly invokes the parmchk2 application when it's about a different ff from gaff. This is definitely a bug we will fix as soon as possible (it doesn't seem to be related to the current problem)

After a careful analysis of the log file, we noticed several inconsistencies. The error appears to be related to the receptor rather than the ligand. 6975 atoms are read (corresponds to the Protein-H group as expected), however, tleap assigns only 6988 hydrogen atoms which exactly leads to the 7 atom mismatch between topology and trajectory. We noted that your tpr contains PBC, that is, the structure that is generated from the tpr is not consistent so it could have repercussions of this type. We have a description of how to deal with this in the Q&A/Calculation/Inconsistent energy section

Please check this and try again and tell us. We will be happy to follow up on the issue.

PS: Don't worry about the length of the publication, in fact, we love this, as we can have a good amount of starting information.

EdwardMendez95 commented 3 years ago

Greetings,

Thank you for your response. I'm alleviated about the parameterization. I will check about the trajectory and PBC correction, but I think this is not the issue since I verified in VMD that the trajectories don't jump/overlap. I will try Q&A/Calculation/Inconsistency energy. Still according to the difference of Protein-H maybe there's a issue recognizing some histidine protonations and there's a lack of hydrogens (indeed, the net charge according to antechamber programs .log file from gmx_MMPBSA appears to be of -22 but from the initial simulations carried out in gromacs, the net charge of the receptor is -15, so there's a lack of 7 hydrogens during topology conversion).

I will check about this and if I can't solve the problem, I will send you my files in order to reproduce the error.

Thank you.

EdwardMendez95 commented 3 years ago

Greetings,

I want to update the issue, I think I was able to find the error:

My receptor has some acidic residues in protonated state (this info was obtained by experimental tasks and external simulations). Specifically seven residues: ASP209, ASP657, ASP815, GLU260, GLU294, GLU708 & GLU742 . Gromacs topology was generated by pdb2gmx and with the -inter flag that permits to protonate each residue according to the user by amber99sb-ildn ff parameterization.

So .tpr, .gro and so on has these hydrogens. But when tleap is invoked in gmx_MMPBSA to generate the non-hydrogen PDB, these residues are not correctly recognized:

ATOM 6251 N ASP A 814 112.520 107.780 34.370 1.00 0.00 N
ATOM 6252 CA ASP A 814 112.160 108.970 33.520 1.00 0.00 C
ATOM 6253 CB ASP A 814 111.230 109.990 34.210 1.00 0.00 C
ATOM 6254 CG ASP A 814 111.710 110.600 35.450 1.00 0.00 C
ATOM 6255 OD1 ASP A 814 111.040 111.440 36.030 1.00 0.00 O
ATOM 6256 OD2 ASP A 814 112.860 110.370 35.890 1.00 0.00 O
ATOM 6257 C ASP A 814 113.400 109.600 32.870 1.00 0.00 C
ATOM 6258 O ASP A 814 113.330 110.090 31.730 1.00 0.00 O
HETATM 6259 N ASP A 815 114.530 109.450 33.530 1.00 0.00 N
HETATM 6260 CA ASP A 815 115.790 109.790 32.840 1.00 0.00 C
HETATM 6261 CB ASP A 815 116.940 109.740 33.820 1.00 0.00 C
HETATM 6262 CG ASP A 815 117.060 111.120 34.490 1.00 0.00 C
HETATM 6263 OD1 ASP A 815 116.910 112.240 34.060 1.00 0.00 O
HETATM 6264 OD2 ASP A 815 117.200 110.920 35.790 1.00 0.00 O
HETATM 6265 C ASP A 815 116.150 108.680 31.790 1.00 0.00 C
HETATM 6266 O ASP A 815 116.240 108.960 30.580 1.00 0.00 O
ATOM 6267 N SER A 816 116.160 107.380 32.150 1.00 0.00 N

ATOM 6268 CA SER A 816 116.420 106.150 31.370 1.00 0.00 C
ATOM 6269 CB SER A 816 116.020 104.890 32.300 1.00 0.00 C
ATOM 6270 OG SER A 816 116.960 104.820 33.410 1.00 0.00 O
ATOM 6271 C SER A 816 115.570 105.970 30.080 1.00 0.00 C
ATOM 6272 O SER A 816 116.100 105.480 29.070 1.00 0.00 O
ATOM 6273 N ILE A 817 114.360 106.440 29.980 1.00 0.00 N
ATOM 6274 CA ILE A 817 113.540 106.350 28.770 1.00 0.00 C
ATOM 6275 CB ILE A 817 111.960 106.320 28.990 1.00 0.00 C
ATOM 6276 CG2 ILE A 817 111.510 107.570 29.640 1.00 0.00 C
ATOM 6277 CG1 ILE A 817 111.190 106.010 27.690 1.00 0.00 C
ATOM 6278 CD1 ILE A 817 109.690 105.910 27.820 1.00 0.00 C
ATOM 6279 C ILE A 817 113.940 107.450 27.770 1.00 0.00 C
ATOM 6280 O ILE A 817 113.810 107.350 26.540 1.00 0.00 O
ATOM 6281 N ILE A 818 114.720 108.450 28.200 1.00 0.00 N
ATOM 6282 CA ILE A 818 115.290 109.520 27.360 1.00 0.00 C
ATOM 6283 CB ILE A 818 115.440 110.890 28.040 1.00 0.00 C
ATOM 6284 CG2 ILE A 818 116.010 111.950 27.060 1.00 0.00 C
ATOM 6285 CG1 ILE A 818 114.120 111.290 28.780 1.00 0.00 C
ATOM 6286 CD1 ILE A 818 114.280 112.440 29.770 1.00 0.00 C
ATOM 6287 C ILE A 818 116.620 108.920 26.840 1.00 0.00 C
ATOM 6288 O ILE A 818 116.830 108.730 25.680 1.00 0.00 O

Interestingly, also histidines are marked as hetero-atoms:

HETATM 6649 N HIS A 868 125.290 93.370 20.570 1.00 0.00 N
HETATM 6650 CA HIS A 868 124.250 94.280 20.120 1.00 0.00 C
HETATM 6651 CB HIS A 868 124.840 95.360 19.330 1.00 0.00 C
HETATM 6652 CG HIS A 868 123.980 96.600 19.170 1.00 0.00 C
HETATM 6653 ND1 HIS A 868 122.890 96.640 18.310 1.00 0.00 N
HETATM 6654 CE1 HIS A 868 122.380 97.820 18.570 1.00 0.00 C
HETATM 6655 NE2 HIS A 868 123.230 98.590 19.310 1.00 0.00 N
HETATM 6656 CD2 HIS A 868 124.230 97.800 19.700 1.00 0.00 C
HETATM 6657 C HIS A 868 123.140 93.470 19.370 1.00 0.00 C
HETATM 6658 O HIS A 868 123.490 92.640 18.580 1.00 0.00 O

ATOM 6659 N PRO A 869 121.870 93.760 19.570 1.00 0.00 N
ATOM 6660 CD PRO A 869 121.400 94.610 20.680 1.00 0.00 C
ATOM 6661 CG PRO A 869 119.900 94.460 20.740 1.00 0.00 C
ATOM 6662 CB PRO A 869 119.710 93.090 20.190 1.00 0.00 C
ATOM 6663 CA PRO A 869 120.690 93.090 19.010 1.00 0.00 C
ATOM 6664 C PRO A 869 120.020 93.680 17.750 1.00 0.00 C
ATOM 6665 O PRO A 869 118.940 93.170 17.320 1.00 0.00 O
HETATM 6666 N HIS A 870 120.540 94.810 17.220 1.00 0.00 N
HETATM 6667 CA HIS A 870 120.070 95.210 15.920 1.00 0.00 C
HETATM 6668 CB HIS A 870 118.840 96.110 16.060 1.00 0.00 C
HETATM 6669 CG HIS A 870 119.170 97.540 16.470 1.00 0.00 C
HETATM 6670 ND1 HIS A 870 118.920 98.220 17.630 1.00 0.00 N
HETATM 6671 CE1 HIS A 870 119.000 99.540 17.340 1.00 0.00 C
HETATM 6672 NE2 HIS A 870 119.210 99.790 16.100 1.00 0.00 N
HETATM 6673 CD2 HIS A 870 119.260 98.510 15.520 1.00 0.00 C
HETATM 6674 C HIS A 870 121.130 95.810 15.010 1.00 0.00 C
HETATM 6675 O HIS A 870 120.760 96.000 13.850 1.00 0.00 O

So when tleap build the amber topology, does not recognize correctly these residues and that's why there's a lack of 7 hydrogens that gives a net charge of -22 (instead of -15, from the initial gromacs simulation). Since the topology is different from the trajectories, the error is invoked.

Apparently, there's a problem with the recognition of acidic or basidic protonated residues inside the workflow (I do not know if that's for the difference of the amber forcefield used initially to build the MD simulation and the ff used for these post-processing task in gmx_MMPBSA).

Thank you.

marioernestovaldes commented 3 years ago

Hi @EdwardMendez95! There is something weird going on here... gmx_MMPBSA should recognize residues with different protonation states... anyway, we'll appreciate if you could send us the files (at least the tpr, the index file with the groups you are considering as receptor and ligand, and the topology file) so we can go over it and figure out what's going on

cheers!

EdwardMendez95 commented 3 years ago

Greetings,

Of course, I will send you the files. Do I have to upload here or by email?

marioernestovaldes commented 3 years ago

send me an email at marioe911116@gmail.com

best!

EdwardMendez95 commented 3 years ago

send me an email at marioe911116@gmail.com

best!

Thank you! I will send the files ASAP

marioernestovaldes commented 3 years ago

Hi @EdwardMendez95! you were very right sir. The error was related to the way gmx_MMPBSA parse the protonated residues. Although gmx_MMPBSA was able to parse protonated ASP and GLU residues, it was incomplete tough. We use residues names to parse this kind of information and it turns out there are several names for the same kind of residue (for example the protonated ASP can appear as ASH, ASPH [your case], or even as ASP with the HD2 atom). Anyway, we updated the code to account for these various names. Please, upgrade to the development version with:

amber.python -m pip install git+https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA --upgrade

this way, it should run smoothly in your system... if so, please feel free to close the issue... if you have any additional questions don't hesitate to contact us

best!

EdwardMendez95 commented 3 years ago

Hi @EdwardMendez95! you were very right sir. The error was related to the way gmx_MMPBSA parse the protonated residues. Although gmx_MMPBSA was able to parse protonated ASP and GLU residues, it was incomplete tough. We use residues names to parse this kind of information and it turns out there are several names for the same kind of residue (for example the protonated ASP can appear as ASH, ASPH [your case], or even as ASP with the HD2 atom). Anyway, we updated the code to account for these various names. Please, upgrade to the development version with:

amber.python -m pip install git+https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA --upgrade

this way, it should run smoothly in your system... if so, please feel free to close the issue... if you have any additional questions don't hesitate to contact us

best!

Greetings,

That's awesome, thank you for helping me. I'm going to try with my systems.

Have a nice day, sorry for the long issue.