deGrootLab / pmx

Toolkit for free-energy calculation setup/analysis and biomolecular structure handling
GNU Lesser General Public License v3.0
133 stars 49 forks source link

Using hydrogen mass repartitioning and non-equilibrium FEP. #33

Closed noahharrison64 closed 1 year ago

noahharrison64 commented 1 year ago

Hi all,

I've been looking at speeding up my FEP calculations. While I've been looking at increased end state sampling using a variety of enhanced sampling methods, an easy win looks like using hydrogen mass repartitioning (HMR) and increasing the timestep from 2fs to 4fs.

Has anyone had experience doing this using the non-eq RFBE approach? I have run some test simulations, and while 3/4 worked, one simulation broke during the first eq_nvt stage (following a successful minimisation) due to LINCS warnings and the following fatal error:

Fatal error: There are 52 perturbed non-bonded pair interactions beyond the pair-list cutoff of 1.26 nm, which is not supported. This can happen because the system is unstable, or because intra-molecular interactions are excluded. The latter case is usually triggered by the use of couple-intramol=no. The issue can be either due to a too small box, which can cause a molecule and its periodic image to end up in the pairlist. In this case increase the box size. Or it can happen because interactions at distances longer than rlist are excluded, in which case you can try to increase nstlist or rlist to avoid this.

This system attempts to convert a methyl R group into a propyl R group, and hence there are quite a few hydrogens in the hybrid topology (pictured in orange) image

Any suggestions as to what may have caused this system to break? Any thoughts about using HMR with non-eq FEP? My plan is to try and stabilise the system with non-HMR EM / EQ prior to increasing timestep / repartitioning hydrogens and running a longer simulation.

Thanks, Noah

vgapsys commented 1 year ago

Hey Noah,

yes, I use HMR with non-eq TI. Which repartitioning ratio do you use? In my experience 3x hydrogen mass gives stable performance. Also, sd integrator behaves better than md (although this comes form a several observations only and I haven't done a systematic investigation on this).

For the problematic case that you show, what is the maximal force after energy minimization?

Vytas

noahharrison64 commented 1 year ago

I've been using 4x scaling factor for repartitioning - I will have a go with 3x to see what happens. Am I right in thinking you have to modify the pdb2gmx source code to change the scaling factor, which looks to be currently hard coded at 4, and then recompile this modified gromacs version?

I'm also thinking of using a more gentle equilibration scheme following minimisation - position restraints and simulated annealing. This will then be followed by unrestrained 'production' equilibrium end state simulations.

For energy minimisation I got the following info:

EM has stopped but the forces have not converged to requested precision Fmax < 100. Steepest Descents converged to machine precision in 6463 steps, but did not reach the requested Fmax < 100. Potential Energy = -7.4526362e+05 Maximum force = 5.8676514e+02 on atom 4661 Norm of force = 6.7524392e+00

Atom 4661 is the amide nitrogen closest to the R group transformation.

noahharrison64 commented 1 year ago

When using position restraints on the solvent leg of the simulation I get a warning when I try to run any equilibration, post energy minimisation. I generated a ligand position restraint file using the hybrid topology coordinate file specifying non-hydrogen atoms.

WARNING 2 [file ligand_posre.itp, line 5]: Some parameters for bonded interaction involving perturbed atoms are specified explicitly in state A, but not B - copying A to B

Do you have any suggestions as to what's going on here? If I ignore the warning the simulation crashes (lincs warnings followed by segmentation fault core dumped) In the complex leg of the simulation, also using gentle equilibration protocol, I get lincs warnings followed by the fatal error:

Fatal error: There are perturbed non-bonded pair interactions beyond the pair-list cutoff of 1.111 nm, which is not supported. This can happen because the system is unstable

noahharrison64 commented 1 year ago

Update on this: Using a scaling factor of 3 rather than 4 removes the instabilities and allows the simulation to continue.

Just to come back to use of integrator - in the mdp files you provided in this repo for the protein-ligand benchmark, you use SD integrator for the first EQ stage (eq_nvt) and then MD integrator for the longer EQ NPT stage. Is there any rationale behind this? If you think SD integrator works better would it make sense to use for both stages?

vgapsys commented 1 year ago

First, regarding the warning:

WARNING 2 [file ligand_posre.itp, line 5]: Some parameters for bonded interaction involving perturbed atoms are specified explicitly in state A, but not B - copying A to B it means that the atoms which are marked for an alchemical transition have position restraints defined only for stateA. StateB force constants are not set, therefore, gromacs will use the same force constants as for stateA. Usually, this is what you want and it is safe to ignore the warning.

Second, do I understand it correctly, that the simulation instability is now resolved?

Third, yes, SD integrator could be used for the transition.

noahharrison64 commented 1 year ago

Hi Vytas,

Yes the simulation instability is resolved. I used BioSimSpace HMR module to repartition the protein with a 3x SF since I was unable to edit the GMX source code to allow 3x SF.

noahharrison64 commented 1 year ago

Hi Vytas,

I've been attempting to reproduce the ProtLigBenchmark C-Met data using PMX with a 4fs timestep. While I have had some success with this, a large number of the transformations fail, including:

"CHEMBL3402752-30000-12_CHEMBL3402754-40-14", "CHEMBL3402755-4200-15_CHEMBL3402761-1-21", "CHEMBL3402749-500-9_CHEMBL3402754-40-14", "CHEMBL3402749-500-9_CHEMBL3402748-5300-8", "CHEMBL3402752-30000-12_CHEMBL3402755-4200-15", "CHEMBL3402754-40-14_CHEMBL3402761-1-21", "CHEMBL3402754-40-14_CHEMBL3402751-2100-11", "CHEMBL3402753-200-13_CHEMBL3402748-5300-8"

These fail in the first stage of the simulation, following energy minimisation, again due to lincs warnings and the fatal error described in my original post. The energy minimisation appears to have converged.

Steepest Descents converged to Fmax < 100 in 4196 steps Potential Energy = -2.7789797e+04 Maximum force = 9.9875107e+01 on atom 4 Norm of force = 9.4551150e+00 Finished mdrun on rank 0 Mon Feb 27 18:01:11 2023

These are the mdp settings for the NVT annealing procedure I run following energy minimisation, and when the simulation breaks.

Do you have any suggestions for settings I could look at changing to try and increase the stability of my system? I had previously tried using position restraints on the ligand to no success.

vgapsys commented 1 year ago

Set constraints=h-bonds

noahharrison64 commented 1 year ago

Hi,

I was using constraints=all-bonds after seeing the post here in which you say:

Switching to h-bond constraints is likely incorrect for the pmx based ligand hybrid topologies. In case, a hydrogen atom is mapped to a dummy in stateA, the constraint will not be generated for the bonds involving this dummy.

Is this no longer a concern? Or can I get away with it during initial annealing equilibration and switching back to all-bonds during production equilibration?

vgapsys commented 1 year ago

For the version in 'develop' branch this is no longer an issue.

vgapsys commented 1 year ago

Dummy hydrogens are now named as HV*. This allows gromacs to interpret them as hydrogens and apply constraints accordingly. For h-bonds constraints, no hydrogen to heavy atom mapping is allowed.

noahharrison64 commented 1 year ago

Okay brilliant, will give that a go! Thanks.

bisdan commented 1 year ago

Dummy hydrogens are now named as HV*. This allows gromacs to interpret them as hydrogens and apply constraints accordingly. For h-bonds constraints, no hydrogen to heavy atom mapping is allowed.

Hey Vytas,

sorry for opening this again. Could you clarify whether h-bonds constraints are now an option for hybrid topologies involving amino acid mutations prepared with pmx? The first two sentences make me think that h-bond constraints should be fine, while the last remark puts that into doubt again.

If I understand correctly, it is not valid to use h-bond constraints if the hybrid topology includes hydrogens in state A that are transformed into heavy atoms in state B (and vice versa).

Thank you in advance

Daniel

vgapsys commented 1 year ago

sorry for opening this again. Could you clarify whether h-bonds constraints are now an option for hybrid topologies involving amino acid mutations prepared with pmx? The first two sentences make me think that h-bond constraints should be fine, while the last remark puts that into doubt again.

yes, you can use h-bonds constraints only for hybrid topologies. In amino acids, hydrogens are not mapped to heavy atoms.

If I understand correctly, it is not valid to use h-bond constraints if the hybrid topology includes hydrogens in state A that are transformed into heavy atoms in state B (and vice versa).

correct, for such perturbations all-bond constraints would be needed