RagnarB83 / ash

ASH is a Python-based computational chemistry and QM/MM environment, primarily for molecular calculations in the gas phase, explicit solution, crystal or protein environment.
GNU General Public License v2.0
56 stars 13 forks source link

QMMM theory combined with NEB-TS #423

Closed 0xBytefuzz closed 3 months ago

0xBytefuzz commented 3 months ago

When I use QMMM theory combined with NEB-TS, I encounter a bug.

Haha, I’ve encountered a problem again and have to bother you once more. Please forgive me, as I am a novice. During the execution of NEB-TS, I encountered an issue where NEB-TS terminated due to an abnormal termination of xtb. The image folders were created normally and seem to have executed one round correctly (during the creation of image folders), but in the second round, the "xtb_.xyz" file in image_1 contains coordinate information that is all "nan", so I am unsure where the problem lies. When using NEB-TS, I used the option actatoms=qmatomlist (about 100 atoms), and both the reactant and product were optimized. Additionally, my input file is large, coming from a solvated tetramer protein, but I believe ASH can handle it normally. I am sharing my log with you, hoping it can help you identify the issue. If you need any other files, I am more than happy to provide them.

Additionally, I have a few small questions I’d like to ask:

During the execution of NEB-TS, the optimization process for the reactant allows the active atom range to be within 10 angstroms from a certain substrate atom (approximately 1000 atoms), with qmatomlist being several key amino acids. The optimization process for the product structure involves reading the optimized structure of the reactant, setting actatoms=qmatomlist, allowing only qmatomlist atoms to move, constructing the post-reaction product state, and optimizing it. The optimized structure is then input into NEB-TS. Is this process reasonable? If not, I would very much like to hear your suggestions on this matter, no matter the aspect.

Before using ASH, I typically use amber and orca/xTB to perform QM/MM MD to calculate the reaction barrier of the enzyme, or use Gaussian to perform ONIOM combined with addgic to find the transition state. If I use ASH to calculate the reaction barrier and transition state of the enzyme, are there methods to define reaction coordinates (CV: d1-d2)? Is NEB-TS a recommended approach if I use ASH to calculate and find my transition state? I see that you are scripting umbrella sampling functionality, and I am very much looking forward to testing it.

Thank you for your efforts. My future work will frequently use ASH, and I am very willing to help improve the tutorial section of the manual, as this is very important for beginners. Although my tutorial writing might have errors, I believe you can correct them. Thank you very much for your patience in reading this far. I wish you happiness and health. test.log

RagnarB83 commented 3 months ago

The problem seems to be that when NEB calls QM/MM which calls xTB then something happens during the xtb step which results in a "nan" energy (nan= Not a Number). This happens already in the first NEB iteration (-1) where single-point calculations are performed on the reactant and product geometries (otherwise frozen). So the problem is already there. The nan coordinates for the other images should be a consequence of what happened in the reactant and product image calculations. Most likely xTB failed to converge, check this by inspecting the xTB outputifle, xtb.out, for image 0 (should be an image_0 directory, created on scratch). Also check that the coordinates in the image_0 directory are normal. If they are not then there is an ASH bug present somewhere. The reason for xtb not converging is usually that something is wrong with the geometry provided with it.

You are sure that the reactant and product geometries are already optimized? They need to be optimized at the same QM/MM level of theory as the NEB, so the xTB/MM optimization needs to have been done for both reactant and product with the exact same QM-region and active-region. Since you are reading in Amber-coordinatefiles for reactant and product I suspect this is not the case.

If you can't figure it out , then make the files available to me so that I can reproduce things.

During the execution of NEB-TS, the optimization process for the reactant allows the active atom range to be within 10 angstroms from a certain substrate atom (approximately 1000 atoms), with qmatomlist being several key amino acids. The optimization process for the product structure involves reading the optimized structure of the reactant, setting actatoms=qmatomlist, allowing only qmatomlist atoms to move, constructing the post-reaction product state, and optimizing it. The optimized structure is then input into NEB-TS. Is this process reasonable? If not, I would very much like to hear your suggestions on this matter, no matter the aspect.

This sounds allright, although in principle the product should be fully minimized as well using the full 1000-atom active region. I'm curious how you ended up with opt0.rst7 and opt1.rst7 files as input for the NEB if you say that you optimized the reactant and product ?

Regarding your questions. Use of either NEB or NEB-TS is absolutely what I would recommend to find transition states and we have used it for QM/MM studies on protein. This is an almost black-box method that requires you in principle only to supply geometries for 2 minima. The Knarr-ASH implementation is almost exactly the same as the one found in ORCA which was thoroughly tested and benchmarked in this paper (https://pubs.acs.org/doi/10.1021/acs.jctc.1c00462). The advantage of NEB is that it finds the reaction coordinate on its own by minimizing the "elastic band" that connects the images created by an interpolation. NEB is thus pretty much guaranteed to converge to the minimum energy path (except for convergence problems, dependent on how many images you choose). This is unlike a relaxed surface scan where you pick 1 or more reaction coordinates that will always be a crude approximation to the real reaction coordinate and will often end up avoiding the saddlepoint-region. Additionally the path becomes biased by the choice of reaction coordinate. NEB-TS speeds up the process by only partially minimizing the NEB pathway, picking the climbing image (the approximation for the saddlepoint while the job is is running) and once mild NEB convergence criteria have been satisfied, the job switches to an eigenvector-following TS-optimization job (using a Hessian) which should locate the Hessian precisely.

Sometimes relaxed scans can be useful, however, you can use the calc_surface function in ASH for this (https://ash.readthedocs.io/en/latest/surfacescan.html). There you can define a distance, angle and dihedral as reaction coordinate. I would typically not recommend it for finding a transition state (for reasons above), however, it is sometimes useful to get a specifically defined slice of the potential energy surface or use it to explore specific mechanistic scenarios.

Since you ask about CVs or collective variables and umbrella sampling: Collective variables is typically used in the context of enhanced sampling methods such as metadynamics and umbrella sampling. They have the same meaning as reaction coordinates but apply when doing molecular dynamics or free energy simulations. Now both NEB and surface-scans explore the zero Kelvin potential energy surface, the advantage is that we don't have to simulate but simply locate the stationary points. If you are using a dynamical approach then you are sampling the free energy surface at some temperature. This is a more realistic description of reality but much more expensive to calculate since you have to run molecular dynamics and sample enough to get reliable statistics.

Umbrella sampling is in principle available in ASH but is not fully tested and I would not attempt to use it yet. What is available as a free-energy method based on collective variables, and what I would recommend anyway is the metadynamics approach which is fully available in ASH and can be used for QM/MM as well. You can see what is possible by reading the tutorial: https://ash.readthedocs.io/en/latest/mtd_tutorial.html

0xBytefuzz commented 3 months ago

Thank you very much for your reply. Both of my rst files have been optimized, and I used the default Amber file reading method, so I provided the rst files.

After the optimization was completed, I converted Fragment-optimized. xyz to a pdb file (keeping the order) and used cpptraj to save it as a pdb and rst file:

cpptraj parm ../qm_top/wtca_0.parm7 trajin opted.pdb trajout opt1. rst7 restart trajout opt1. pdb pdb go exit In this way, even if the pdb file converted from the xyz file does not contain any amino acid information, the opt1.pdb file will get the correct information corresponding to wtca_0.parm7, and then opt1.rst7 can be passed into ash.

In fact, the method I used to construct the reactants was quite cumbersome. I extracted the key residues from the optimized structure, then adjusted them to the reactant state in GaussView, and then replaced the coordinates of the modified atoms. After that, I used cpptraj to save it as an rst file. I didn’t try whether Rfrag=Fragment(amber_prmtopfile=prmtopfile, amber_inpcrdfile=R_inpcrdfile) could read coordinates in other formats. Additionally, some simulation steps write velocities into the rst file, which I needed to remove. After optimization, I used cpptraj to save it as an rst file again for the next reading. Fortunately, using scripts made the conversion process convenient, so I didn’t try other ways to read coordinates.

Another reason is that when the optimization is completed, the pdb file output by ASH seems to have incorrect column information for the coordinates of some water molecules, so I used the XYZ file. I’ve included a structure at the end for you to observe.

I constructed a smaller test system, and this time the run was normal and didn’t stop unexpectedly; I closed it manually. I suspect the reason for the previous failure is most likely because they did not use the same theoretical level as NEB. The previous optimization used "! BLYP D3BJ def2-SVP def2/J TightSCF". Although I know NEB must be performed at the same level as the optimization on both ends, I wanted to quickly test whether NEB can be used with QMMMTheory. When performing NEB, I used the xTB-GFN2 method.

So the error was caused by me, and I apologize for that.

Overall, thank you very much for your patience in answering my questions. OptimizedFragment.zip

RagnarB83 commented 3 months ago

Ah I see, glad that this seemed to have been resolved. Its a bit unusual for xtb to crash this badly though simply because the coordinates were not optimized at the exact same theory level. What I suspect could be the case that is that during your coordinate-file-format conversions from XYZ to PDB to RST7 is that you lose numerical precision in the coordinates and you end up with slightly displaced coordinates that happens to make xtb crash in this case. I dont know about the Amber format but its definitely the case that the PDB format does not have a lot of numerical precision for the coordinates.

If you keep using ASH I would recommend to avoid changing file formats too much like this for calculations that depend on each other (here the R-geometry and P-geometry coordinates depend on each other for the NEB calculation) to avoid things getting lost by conversion programs. It is best to use the XYZ-file from the ASH geometry optimization (Fragment-optimized.xyz), rename it (e.g. R_opt.xyz and P_opt.xyz) and then read it into the NEB job. The XYZ files created by ASH have full numerical precision. I would avoid relying on PDB-files for coordinates (they are fine as an initial structure, also when used as a topology file or when used for basic protein visualization).

RagnarB83 commented 3 months ago

Interesting PDB-file problem. Not quite sure how that happens but possibly a bug in the write_pdbfile function that you used. This bug may have been fixed already in a recent ASH version. If you can send me the XYZ-file that resulted in that strange PDB-file (with the weird water coordinates) and the exact write_pdb function call you used to produce it, I can take a look to make sure the problem has been solved or not.

However, hopefully the problem is fixed by using the frag.write_pdbfile_openmm() method instead. This function uses the OpenMM library to write the PDB-file more reliably and will be the recommended way of writing PDB-files from now on. Documentation about writing PDB-files needs to be updated.