Need some clarification on Yank analysis results

aghanbar commented 7 years ago

I ran the same system with two different setups. In one, my 'receptor' is immersed in water using clearance = 16 A. In the other one I make my receptor periodic by setting the periodic box around it to make a crystal-like system. In the latter system, no water is added or present since the box only contains the receptor and the space for ligand which is void. In both cases the ligand is the same, therefore the solvent complex is the same, the difference between them is only the complex phase. The final results for these two systems are: System immersed in water with clearance=16 A:

2017-03-01 21:40:13,393: Free energy of binding:          -30.307 +- 2.322 kT (         -18.068 +- 1.384 kcal/mol)
2017-03-01 21:40:13,393:
2017-03-01 21:40:13,393: DeltaG complex                   :          402.409 +- 1.844 kT
2017-03-01 21:40:13,393: DeltaG restraint                 :                     0.774 kT
2017-03-01 21:40:13,393: DeltaG solvent                   :          372.876 +- 1.411 kT
2017-03-01 21:40:13,393:
2017-03-01 21:40:13,393: Enthalpy of binding:         -221.789 +- 112.206 kT (        -132.222 +- 66.893 kcal/mol)

The periodic crystal-like system with no water:

2017-03-01 21:41:13,882: Free energy of binding:          -59.047 +- 2.284 kT (         -35.202 +- 1.362 kcal/mol)
2017-03-01 21:41:13,882:
2017-03-01 21:41:13,882: DeltaG complex                   :         -262.210 +- 2.196 kT
2017-03-01 21:41:13,882: DeltaG restraint                 :                     0.774 kT
2017-03-01 21:41:13,882: DeltaG solvent                   :         -320.483 +- 0.628 kT
2017-03-01 21:41:13,882:
2017-03-01 21:41:13,882: Enthalpy of binding:           11.750 +- 17.543 kT (           7.005 +- 10.459 kcal/mol)

So my question is why DeltaG solvent, which is the same in both systems is very different between the two? One negative and the other positive? And I would appreciate some clarification of what these values actually mean.

andrrizzi commented 7 years ago

Hmm. I'm not entirely sure I understand the setup of the two simulations. Could you copy-paste/link the scripts you used?

So my question is why DeltaG solvent, which is the same in both systems is very different between the two?

If one system is immersed in explicit water and the other is in vacuum, I would expect the DeltaG solvent to be different. If you used default options (i.e. annihilate electrostatics and decouple sterics) DeltaG solvent is the free energy of "turning on" your ligand in a water box/vacuum. In vacuum, the only free energy contribution will come from turning on the ligand-ligand electrostatic interactions, while in water that includes also ligand-water electrostatics and VdW interactions.

That being said, that DeltaG solvent looks quite big. How many atoms does your ligand have?

aghanbar commented 7 years ago

The solvent phase setup is exactly the same, but the complex phase is basically the complex in a bigger box filled water while in the other one is the complex tightly fit in a box which allows no water to put in. Here are the files I used: solvent.leap.in.txt complex_nowater.leap.in.txt complex_inwater.leap.in.txt model3_nowater.yaml.txt model3_water.yaml.txt

EDIT: My ligand has 100 atoms.

andrrizzi commented 7 years ago

Ah, sorry I misunderstood. That is weird! Before looking for a bug, I'd try these:

The only difference I could notice in the setup is that you are not using the anisotropic correction in the experiment without water and using it in the one with water. You can try to deactivate it in the model with water too to see if you get better agreement.
Analyze the solvent data with the new notebook. @Lnaden is that release or would he need the dev version for this?
Run for longer and collect more samples to see if the two converge.

Note that you don't have to resample the complex phase if you use this trick:

Move the solvent.nc file to solvent_back.nc for backup.
Start the same Yank simulation in a different folder with number_of_iterations: 1. This will create fresh complex.nc and solvent.nc files.
Move the fresh solvent.nc file in the original folder.
Set resume_simulation: yes and restart Yank. Yank will see that you have already collected 500 samples from the complex.nc phase and proceed to collect 500 samples only from the solvent phase.

aghanbar commented 7 years ago

Ok so I did as instructed, set anisotropic correction to no for the water simulation the reran the solvent simulations and got the following results:

2017-03-02 16:15:06,283: Free energy of binding:         -726.794 +- 2.087 kT (        -433.287 +- 1.244 kcal/mol)
2017-03-02 16:15:06,283:
2017-03-02 16:15:06,283: DeltaG complex                   :          402.409 +- 1.844 kT
2017-03-02 16:15:06,283: DeltaG restraint                 :                     0.774 kT
2017-03-02 16:15:06,283: DeltaG solvent                   :         -323.610 +- 0.978 kT
2017-03-02 16:15:06,283:
2017-03-02 16:15:06,283: Enthalpy of binding:         -909.731 +- 26.789 kT (        -542.347 +- 15.971 kcal/mol)

This strange number is obtained because the complex simulation is run with anisotropic correction and the solvent one without it. Now I think it's clear that the difference in results is because of anisotropic correction. But why should switching it on or off make such a big difference, and how do you suggest I deal with it?

andrrizzi commented 7 years ago

To clarify, the two phases are effectively independent so the DeltaG solvent doesn't depend on whether the complex simulation is run with anisotropic correction or not, but the correction is applied independently also to the solvent phase when the option is set to yes (or let as default).

I'm surprised that the correction has such a big effect on the final free energy. I wonder if the ligand is charged and if the box neutralization with Na+ and Cl- ions could be blamed. I think your solvent.leap.in script should print also its total charge and the number of neutralizing ions added in the log file. Could you check it?

You can also try to use a bigger nonbonded_cutoff (something like 11-12 angstroms?). An higher value should decrease the differences between the simulated state and the expanded cutoff state used to correct for long-range anisotropic dispersion interactions.

aghanbar commented 7 years ago

My ligand is a peptide which doesn't have any charged side chains. Both ends are charged though and the net charge is zero so leap doesn't add any ions to the system. As for the cutoff range, the question is why increasing the cutoff can have such a large impact on the deltaG? And does this mean that increasing the cutoff is always going to yield more accurate results?

andrrizzi commented 7 years ago

does this mean that increasing the cutoff is always going to yield more accurate results?

In principle, yes. The anisotropic dispersion correction however is not entirely equivalent to running the simulation at a greater cutoff. It just reweights the samples obtained at a smaller cutoff so if there's not good overlap between the two distributions, the estimate could be poor.

I could be wrong here (tagging @Lnaden @jchodera for confirmations), but if your cutoff cuts your peptide in half, I think the charged ends won't see each other (currently alchemical systems don't model the PME reciprocal space), and they will be simulated in a locally charged environment. This means that when you increase the cutoff enough for the ends to see each other, the distribution will change greatly.

We're developing a scheme to preserve the reciprocal space with annihilated electrostatics, but for now I'd suggest you to try to rerun the solvent phase with a larger cutoff.

Lnaden commented 7 years ago

Correct. The way alchemy is setup right now, all ligand interactions are alchemical and even if you are not annihilating the electrostatics, the long range interactions are not really considered between alchemical atoms. I would suggest running with a larger cutoff if you can through.

If it is too computationally expensive to run the whole YANK calculation with the larger cutoff, you can try splitting the calculation into 2 separate simulations where you run the large cutoff for electrostatics only and a smaller cutoff for discharged Lennard-Jones decoupling. This will introduce some error, but you can try to estimate that error by setting your anisotropic_dispersion_cutoff to the larger cutoff for the Lennard-Jones leg.

aghanbar commented 7 years ago

Thank you for the insights. I think one problem with my system is that in my complex phase I am doing a peptide aggregation simulation so there are many peptides with charged ends. Although the net charge of the system is zero, the free energy gets unusually lower as you add more peptides to the system (~90 kcal/mol). So from the discussion above, my perception is that increasing the cutoff will mitigate this problem. I am still not clear if I set the anisotropic dispersion correction to yes will the program automatically expand the cutoff? So for example if I set the cutoff to 9 A, does it expand the cut-off more than that? I am asking this since I see the following in the log file although the long-range cutoff is set to 9 (default value):

2017-03-07 13:11:43,680 - DEBUG - yank.yank - Setting cutoff for fully interacting system to maximum allowed 16 A or 2017-03-07 13:15:24,748 - DEBUG - yank.sampling - Expanded cutoff Contexts creation took 108.388 s. And previously when I had a smaller system Yank complained because the system was too small and setting the long range cutoff to a smaller number had no effect and I asked about this in another thread and got my answer. But my question now is that if the anisotropic dispersion correction has its own mechanism of manipulating the box size, will setting the long range cutoff in the yaml file have any effect? Or when it's set to yes, should I be changing anisotropic_dispersion_cutoff instead of the long range cutoff? Thanks.

Lnaden commented 7 years ago

The anisotropic_dispersion_correction creates two additional states at the end of the thermodynamic cycle YANK runs on where free energy differences are estimated between. We detail that much more here.

No simulation is actually carried out at those states, and instead the energies from the alchemical systems are computed in these states so we can make an estimate of the free energies at those states. We do this for a couple reasons:

The states YANK simulates are alchemically modified so the "ligand's" charges don't really have long range interactions (past the cutoff distance). That is a current imitation of the implementation is there is no OpenMM way to do CustomPMEReciprocalSpaceForce so we lose those effects.
the way many long range treatments work is to assume the system outside the cutoff contributes to the energy in an isotropic, homogenous way. In hydration free energies of homogenous fluids, this is not a bad assumption. In proteins and membranes, this is inst.

To correct for both these cases, the expanded cutoff, non-alchemically modified systems are created and the free energy difference to them is estimated. The anisotropic_dispersion_cutoff ONLY affects the two additional states, not the alchemical states we actually simulate.

Thinking back on your problem, if your peptide has regions of large local charge, those long range PME effects will be omitted from the main simulation, which would cause large sources of error. The anisotropic_dispersion_correction should help with that, but the free energy difference between those states is still only an estimate.

Lnaden commented 7 years ago

So in short:

The solvents: nonbonded_cutoff setting is the cutoff that the simulated states use
The options: anisotropic_dispersion_correction and anisotropic_dispersion_cutoff are settings for states we estimate free energies at to try and correct for errors from the simulated states and supplement the main simulation.

Lnaden commented 7 years ago

I should note the anisotropic_dispersion_correction only works in periodic systems. We could technically enable it for CutoffNonPeriodic which uses reaction-field electrostatics, but that has both implementation problems and underlying theory problems as I understand it as well

jchodera commented 7 years ago

Closing this, presuming that it has been resolved. If not, please reopen!

aghanbar commented 7 years ago

I can confirm that the problem was caused by multiple charges located at the peptide ends in my system. Capping the charged peptide ends was a solution.

choderalab / yank

Need some clarification on Yank analysis results #649