Valdes-Tresanco-MS / gmx_MMPBSA

gmx_MMPBSA is a new tool based on AMBER's MMPBSA.py aiming to perform end-state free energy calculations with GROMACS files.
https://valdes-tresanco-ms.github.io/gmx_MMPBSA/
GNU General Public License v3.0
211 stars 64 forks source link

[ENH]: qm atom mask selection #454

Open camattelaer opened 7 months ago

camattelaer commented 7 months ago

In which tool?

gmx_MMPBSA

New Feature

Currently qmmask selection is based on full residues, either directly via residue number or within specific distance of the ligand.

Additional option to set qmmask directly for selected atoms would be beneficial to prevent SCF issues if the MD was run using only an MM forcefield.

Description

It would be nice if there was an option to set the QM atom mask other than only based on residue numbers.

Currently creating QM-MM boundary through residues numbers 'cuts' through amide bonds, which is a poor choice of QM-MM boundary.

With MMPBSA.py this is currently only possible using -make-mdins and -use-mdins flags: 1) generate inputs by running MMPBSA.py with -mdins-flag 2) the user modifies the qmmask in the respective input files to not include amide backbone atoms, eg 'cutting' through Calpha-Cbeta bond 3) run MMGBSA.py using modified input files using -use-mdins flag

Currently running some tests and a simulation where gmx_MMPBSA was having troubles with SCF convergence had much better results using the MMPBSA.py approach removing the backbone atoms

I do not expect this to be quickly implemented, but I do believe it would be an added value to gmx_MMPBSA

(relevance: in my opinion low since only applies to people interested in QM-MMGBSA approach, unfortunately i am one of those people) (difficulty: in my opinion medium, perhaps even high, because it does require some additional intermediary file manipulation to run MMPBSA)

Relevance

low

Difficulty to implement

medium

marioernestovaldes commented 7 months ago

I went and took a look at the code to see how difficult was to implement the feature you are requesting. Although implementing the atom mask selection method is possible, it should take a little bit of time and testing. For the time being, there is a relatively easy way to exclude some atoms from the masks.

To update gmx_MMMPBSA from the repo, please type:

python -m pip install git+https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA -U

I added a new variable that you can use in the input file named exclude_backbone. This variable controls whether to exclude backbone atoms from residues selected to treat with QM. It accepts two values:

0: Consider backbone atoms in residues selected to treat with QM
1: Do not consider backbone atoms in residues selected to treat with QM

A typical QMMMGBSA input file would look like this:

Sample input file for QM/MMGBSA calculation
#This input file is meant to show only that gmx_MMPBSA works. Although, we tried to use the input files as recommended in the
#Amber manual, some parameters have been changed to perform more expensive calculations in a reasonable amount of time. Feel free to change the parameters
#according to what is better for your system.

&general
sys_name="QM/MMGBSA",
startframe=5,
endframe=14,
PBRadii=2,
forcefields="oldff/leaprc.ff99SB,leaprc.gaff"
/
#make sure to include at least one residue from both the receptor
#and ligand in the qm_residues mask when using 'ifqnt'.
#this requirement is automatically fulfilled when using the within keyword
#https://groups.google.com/g/gmx_mmpbsa/c/GNb4q4YGCH8
&gb
igb=1, saltcon=0.150,
ifqnt=1, qm_theory=PM3,
# qm_residues selection with "within" keyword (new in v1.5.0)
qm_residues="within 5"
# Exclude backbone atoms
exclude_backbone=1,
/

For the time being, if you want to exclude other atoms different than those of the backbone, you could modify the mask in the createinput.py file

https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA/blob/607bbe26ea582282eee567c1bf7e9a3f85994da9/GMXMMPBSA/createinput.py#L203-L206

Please, let us know if you encounter any issues...

camattelaer commented 7 months ago

Hi Thanks for this very fast reply. Just to be clear I do not expect you to take actions so fast.

At first glance, the atom mask should be modified a bit to exclude hydrogens from backbone aswell: com_input['gb']['qmmask'] = f"({com_input['gb']['qmmask']} & !@N,H,CA,HA,C,O)" rec_input['gb']['qmmask'] = f"({rec_input['gb']['qmmask']} & !@N,H,CA,HA,C,O)" lig_input['gb']['qmmask'] = f"({lig_input['gb']['qmmask']} & !@N,H,CA,HA,C,O)"

Kind regards

CA

marioernestovaldes commented 7 months ago

Done https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA/commit/99143f7568f765175e96a1671df7471e566e9ba0

let me know if you encounter any other issue

camattelaer commented 7 months ago

I think you overlooked you also need H from NH in the backbone, i put it after N in my initial reply for logical ordering:

com_input['gb']['qmmask'] = f"({com_input['gb']['qmmask']} & !@N,H,CA,HA,C,O)"

Kind regards

CA

marioernestovaldes commented 7 months ago

Oh, sorry about that... done https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA/commit/70c18e335938af05a6ee28eb22f5809814b570aa

camattelaer commented 7 months ago

thank you very much for all your efforts.

I will update, perform some tests and report back in a few days :)

Kind regards

CA

camattelaer commented 7 months ago

Ok, so I already encountered an issue: apparently in the mol2 file of the ligand, atom names such as N, H, C, O can also exist.

if the input qmmask would be "A/19,28,43,52" where 52 is the ligand the complex mdin file should then better have a qmmask such as '(:19,28,43&!@N,H,CA,HA,C,O) | :52' to include all atoms from the ligand

I included some test runs with gmx_MMPBSA which fails and MMPBSA.py which does run: test.zip (the included trajectory is smaller than the original because i can't upload 400MB)

Again, if this is too much trouble to fix I have no problems putting this feature on the 'to do' list

marioernestovaldes commented 7 months ago

Got it. Well, considering the masks definition could be complex by itself, instead of hardcoding the masks, I decided to let the user define the masks. With this in mind, I eliminated the exclude_backbone variable and added three new variables that should solve the problem. The three new variables com_qmmask , rec_qmmask , and lig_qmmask are the Amber masks specifying the quantum atoms in the complex, receptor and ligand, respectively. With this new three varibales, the users are free to define intricate masks as per their needs. Just couple of drawbacks of this new way to difine masks:

-When using user defined masks, com_qmmask, rec_qmmask, and lig_qmmask must be defined. -When using user defined masks, automatic assigment of qmcharge_com, qmcharge_rec, and qmcharge_lig is overrided and default or user defined qmcharge_com, qmcharge_rec, and qmcharge_lig are used.

To update gmx_MMMPBSA from the repo, please type:

python -m pip install git+https://github.com/Valdes-Tresanco-MS/gmx_MMPBSA -U

Please, check the docs for more info (https://valdes-tresanco-ms.github.io/gmx_MMPBSA/dev/input_file/#qm-options) and this example (https://valdes-tresanco-ms.github.io/gmx_MMPBSA/dev/examples/QM_MMGBSA/)

Let us know if you encounter any issues...

camattelaer commented 7 months ago

ok perfect. Thanks again

I will do some tests and report back in due time :)

kind regards

CA

camattelaer commented 7 months ago

Hey guys, just wanted to mention that I'm currently not seeing an issue with the manual mask selection! Thank you for all your help on this issue.

I do have another question regarding this topic: a lot of papers restrict the QM region only to the ligand, but gmx_mmpbsa does require a QM selection in the protein aswell. Is there a specific reason for this design? (I noticed some weird behavior for MMPBSA.py where it gets a different value for ESCF for the receptor when using this approach and you have to be carefull interpreting the result in final.dat so was wondering if this would have something to do with it)

I have a different note however: the QM hamiltionian is checked in main.py in the following lines: if INPUT['gb']['ifqnt'] == 1: if INPUT['gb']['qm_theory'] not in ['PM3', 'AM1', 'MNDO', 'PDDG-PM3', 'PM3PDDG', 'PDDG-MNDO', 'PDDGMNDO', 'PM3-CARB1', 'PM3CARB1', 'DFTB', 'SCC-DFTB', 'RM1', 'PM6', 'PM3-ZnB', 'PM3-MAIS', 'PM6-D', 'PM6-DH+', 'AM1-DH+', 'AM1-D*', 'PM3ZNB', 'MNDO/D', 'MNDOD']:

And I noticed only DFTB and SCC-DFTB are defined for the tight binding functionals from DFTB+. These are as far as i am aware synonyms. This hamiltonian lacks parameters for fluorine, for example. the DFTB3 implementation has more pair-wise atom parameters. I added DFTB3 as one of the selectable methods in my own main.py and was able to use it in QM/MM-GBSA calculations. The original MMPBSA.py has the same shortcoming and i notified AMBER mailing list of this issue (http://archive.ambermd.org/202401/0146.html). Do you want me to open a new issue for this?

Kind regards

CA

marioernestovaldes commented 7 months ago

I do have another question regarding this topic: a lot of papers restrict the QM region only to the ligand, but gmx_mmpbsa does require a QM selection in the protein aswell. Is there a specific reason for this design? (I noticed some weird behavior for MMPBSA.py where it gets a different value for ESCF for the receptor when using this approach and you have to be carefull interpreting the result in final.dat so was wondering if this would have something to do with it)

This is an internal requirement from sander. If you want more flexibility in this regard, you can use QUICK through sander (page 183, https://ambermd.org/doc12/Amber23.pdf) and I believe it does support only selecting ligand residues. This is not supported in gmx_MMPBSA though (although I plan to incorporate this option in the future).

And I noticed only DFTB and SCC-DFTB are defined for the tight binding functionals from DFTB+. These are as far as i am aware synonyms. This hamiltonian lacks parameters for fluorine, for example. the DFTB3 implementation has more pair-wise atom parameters. I added DFTB3 as one of the selectable methods in my own main.py and was able to use it in QM/MM-GBSA calculations. The original MMPBSA.py has the same shortcoming and i notified AMBER mailing list of this issue (http://archive.ambermd.org/202401/0146.html). Do you want me to open a new issue for this?

fixed 2fdf1bcb66d9480e7fb1111a7c8e97c194ecbc81

Let me know if you encounter any other issues...