+ve delta G from known X-ray ligand

abradle commented 5 years ago

Hi all,

I've run yank on AWS for ~8 hours on a p2.xlarge instance. This is for a crystallographically refined complex from the XChem (http://www.diamond.ac.uk/Instruments/Mx/Fragment-Screening.html) facility.

I get a mildly +ve delta G of binding. I was wondering if there was anything clearly wrong with my setup?

Data is here - including input complex and full yaml file (Yaml file was prepared by @jchodera) https://drive.google.com/file/d/15lpbkSbWQw5J7ybIN_eHTlQ0O7xGfj-c/view

yank analyze -y free-energy.yaml produces output (just the end):

######## EXPERIMENT: experiments ######## Free energy of binding : 1.263 +- 3.853 kT (0.753 +- 2.297 kcal/mol) DeltaG complex : 148.440 +- 2.907 kT DeltaG standard state correction: 1.777 kT DeltaG solvent : 151.480 +- 2.529 kT

Enthalpy of binding : 200.140 +- 176.937 kT (119.316 +- 105.483 kcal/mol)

It's worth noting - we have no experimental affinity data for this complex, and we soak at 500 mM - so we possibly do fit into the lower end of the prediction (e.g. Kd of 100mM).

Lnaden commented 5 years ago

There are a few things I am seeing:

The error is is still quite large, almost 4kT
Based on the simulation report, you have only run for 4 ns in a SAMS simulation, and there are very few cycles of the sampler going back and forth between the fully coupled and fully decoupled states.
Based on the replica mixing chart, there are a few places at about 0.8-0.7 lambda electrostatics where there are very little exchanges, but based on the states sampled at each iteration from the previous point, that was because the sampler has not run long enough to converge.

It looks like the simulation just needs to run longer, but I find it odd that it only got about 1000 iteration per phase after 8 hours. The K80's are not the fastest, but I would expect them to be faster than that. It might be the time spent during the swap between experiments depending on how fast the disk IO is on that node. Maybe try also increasing the switch_experiment_interval to 100 to increase total iteration / wall clock time if the disk IO is slow, but I thats just a theory.

abradle commented 5 years ago

Gottit! Will run for longer and see if that improves. Can try your suggestion too. Thanks for rapid response.

Best wishes,

Anthony

On Fri, 27 Jul 2018 at 13:46, Levi Naden notifications@github.com wrote:

There are a few things I am seeing:

The error is is still quite large, almost 4kT

Based on the simulation report, you have only run for 4 ns in a SAMS simulation, and there are very few cycles of the sampler going back and forth between the fully coupled and fully decoupled states.

Based on the replica mixing chart, there are a few places at about 0.8-0.7 lambda electrostatics where there are very little exchanges, but based on the states sampled at each iteration from the previous point, that was because the sampler has not run long enough to converge.

It looks like the simulation just needs to run longer, but I find it odd that it only got about 1000 iteration per phase after 8 hours. The K80's are not the fastest, but I would expect them to be faster than that. It might be the time spent during the swap between experiments depending on how fast the disk IO is on that node. Maybe try also increasing the switch_experiment_interval to 100 to increase total iteration / wall clock time if the disk IO is slow, but I thats just a theory.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/yank/issues/1054#issuecomment-408408411, or mute the thread https://github.com/notifications/unsubscribe-auth/AFuRIS8SP334GprgdcbT6G0aYArC-LiUks5uKwuxgaJpZM4VjIgO .

jchodera commented 5 years ago

Thanks for sharing the tarball! I'm running a copy of this locally on our GTX-1080 cluster, and will post an update in a day or so to see if the results look sensible.

abradle commented 5 years ago

ok - thanks.

I've run for about 4 hours more and looking better - but the error bars pretty big still!

######## EXPERIMENT: experiments ######## Free energy of binding : -7.291 +- 3.000 kT (-4.347 +- 1.789 kcal/mol) DeltaG complex : 160.861 +- 1.747 kT DeltaG standard state correction: 2.036 kT DeltaG solvent : 155.606 +- 2.440 kT

Enthalpy of binding : 162.008 +- 73.950 kT (96.583 +- 44.086 kcal/mol)

abradle commented 5 years ago

@jchodera how did it work on the GTX 1080. It never seemed to converge for me and kept fluctuating between +ve and -ve Free energies.

jchodera commented 5 years ago

Throughput is decent on the GTX 1080 (~38 ns/day with PME, ~115 ns/day with GBSA), but the PME calculation is taking a while to converge.

Here is the estimate after ~60 ns:

######## EXPERIMENT: protliggbsa ########
Free energy of binding  :   -25.558 +- 0.450 kT (-15.237 +- 0.268 kcal/mol)
DeltaG complex          :    41.100 +- 0.450 kT
DeltaG standard state correction:              2.060 kT
DeltaG solvent          :    17.601 +- 0.017 kT

Enthalpy of binding     :   -38.380 +- 14.547 kT (-22.881 +- 8.672 kcal/mol)
######## EXPERIMENT: protligpme ########
Free energy of binding  :    -4.903 +- 2.070 kT (-2.923 +- 1.234 kcal/mol)
DeltaG complex          :   156.275 +- 1.070 kT
DeltaG standard state correction:              3.496 kT
DeltaG solvent          :   154.867 +- 1.772 kT

Enthalpy of binding     :   214.082 +- 121.278 kT (127.628 +- 72.301 kcal/mol)

The GBSA calculation looks reasonably converged from examining the state sampling: However, the PME calculation looks like there are some issues:

The solvent calculation has weird jumps that I don't understand, but want to track down. As far as I know, this shouldn't happen
Those jumps cause the automatic equilibration detection to analyze only the tail end of the solvent calculation, likely leading to erroneously low affinity estimates
Visualizing the trajectory shows that we aren't sampling among the different potential binding modes nearly enough, so longer simulations (or something more clever to encourage better mixing of alchemical states and better state sampling) is likely required I have more to dig into here after running the simulations longer and digging into the u_n jump issue. Will report back soon!

abradle commented 5 years ago

OK - thanks John!

Those jumps cause the automatic equilibration detection to analyze only the tail end of the solvent calculation, likely leading to erroneously low affinity estimates

That might explain something I was seeing. I would kill a simulation after 6 hours and get say -6 +- 2.5. I would then run for another hour and get +8 +-2.8 which seemed odd.

Will also try GBSA.

I've run PME on three other systems - will share tarball tomorrow when I have better network.

Best wishes,

Anthony

On Sun, Aug 5, 2018 at 11:14 PM John Chodera notifications@github.com wrote:

Throughput is decent on the GTX 1080 (~38 ns/day with PME, ~115 ns/day with GBSA), but the PME calculation is taking a while to converge.

Here is the estimate after ~60 ns:

######## EXPERIMENT: protliggbsa ######## Free energy of binding : -25.558 +- 0.450 kT (-15.237 +- 0.268 kcal/mol) DeltaG complex : 41.100 +- 0.450 kT DeltaG standard state correction: 2.060 kT DeltaG solvent : 17.601 +- 0.017 kT

Enthalpy of binding : -38.380 +- 14.547 kT (-22.881 +- 8.672 kcal/mol) ######## EXPERIMENT: protligpme ######## Free energy of binding : -4.903 +- 2.070 kT (-2.923 +- 1.234 kcal/mol) DeltaG complex : 156.275 +- 1.070 kT DeltaG standard state correction: 3.496 kT DeltaG solvent : 154.867 +- 1.772 kT

Enthalpy of binding : 214.082 +- 121.278 kT (127.628 +- 72.301 kcal/mol)

The GBSA calculation looks reasonably converged from examining the state sampling: [image: image] https://user-images.githubusercontent.com/3656088/43690618-d261b0d6-98c1-11e8-9baa-b11a5552c1ca.png [image: image] https://user-images.githubusercontent.com/3656088/43690623-df462f66-98c1-11e8-81a2-657f51e2363b.png However, the PME calculation looks like there are some issues:

The solvent calculation has weird jumps that I don't understand, but want to track down. As far as I know, this shouldn't happen

Those jumps cause the automatic equilibration detection to analyze only the tail end of the solvent calculation, likely leading to erroneously low affinity estimates

Visualizing the trajectory shows that we aren't sampling among the different potential binding modes nearly enough, so longer simulations (or something more clever to encourage better mixing of alchemical states and better state sampling) is likely required [image: image] https://user-images.githubusercontent.com/3656088/43690635-1c3ad340-98c2-11e8-962b-fa8147c48832.png [image: image] https://user-images.githubusercontent.com/3656088/43690637-24671c36-98c2-11e8-878b-ecd863f3c18e.png I have more to dig into here after running the simulations longer and digging into the u_n jump issue. Will report back soon!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/choderalab/yank/issues/1054#issuecomment-410552403, or mute the thread https://github.com/notifications/unsubscribe-auth/AFuRIY3byQey4CEVpLG-MfplQ1uNVabuks5uN241gaJpZM4VjIgO .

jchodera commented 5 years ago

That might explain something I was seeing. I would kill a simulation after 6 hours and get say -6 +- 2.5. I would then run for another hour and get +8 +-2.8 which seemed odd.

That phenomenon is almost certainly due to the automatic equilibration detection analyzing just the last part of the simulation after some sort of conformational change or ligand binding pose switch. We'll eventually improve the logic for that, but for now, I think we'll add an option to force analysis of nearly the whole trajectory, which should greatly improve stability.

Will report back soon!

choderalab / yank

+ve delta G from known X-ray ligand #1054