Different results from different platforms

mmagithub commented 6 years ago

Hello, I am asking if it is expected to get different results from YANK absolute-binding energy modules from running the same complex/options/YANK version at two different clusters. I totally understand that such issues are common with atomistic simulations dues to several round-off and stochasticities, but is there are other direct reasons for this behaviour. I am getting approximately 6kcal/mol errors in the binding energies. Here is my YAML file:

[--- options: minimize: yes minimize_max_iterations: 50 number_of_equilibration_iterations: 50 equilibration_timestep: 1.0femtosecond verbose: yes output_dir: . number_of_iterations: 1000 nsteps_per_iteration: 500 replica_mixing_scheme: swap-all collision_rate: 5.0/picosecond temperature: 300kelvin pressure: 1atmosphere timestep: 2femtoseconds output_dir: explicit precision: auto constraints: HBonds resume_simulation: yes

systems: Rec-Lig: phase1_path: [com.inpcrd,com.prmtop] phase2_path: [norec.inpcrd,norec.prmtop] ligand_dsl: resname MOL solvent: PME-solvent

solvents: PME-solvent: nonbonded_method: PME switch_distance: 10angstroms nonbonded_cutoff: 11angstrom ewald_error_tolerance: 1.0e-4

protocols: absolute-binding: complex: alchemical_path: lambda_restraints: [0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.50, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.000, 1.00, 1.000, 1.00, 1.00] lambda_electrostatics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.000, 0.00, 0.000, 0.00, 0.00] lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.075, 0.05, 0.025, 0.01, 0.00] solvent: alchemical_path: lambda_electrostatics: [1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00] lambda_sterics: [1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.00]

experiments: system: Rec-Lig protocol: absolute-binding restraint: type: Harmonic

]

For one output, this is the results: {2017-11-19 19:08:06,603: Free energy of binding: -4.798 +- 1.181 kT ( -2.860 +- 0.704 kcal/mol) 2017-11-19 19:08:06,603: 2017-11-19 19:08:06,603: DeltaG complex : 111.000 +- 1.032 kT 2017-11-19 19:08:06,603: DeltaG restraint : 0.490 kT 2017-11-19 19:08:06,603: DeltaG solvent : 106.692 +- 0.575 kT 2017-11-19 19:08:06,603: 2017-11-19 19:08:06,604: Enthalpy of binding: -188.720 +- 70.609 kT ( -112.508 +- 42.094 kcal/mol) }

From the other job, i got this output:

[2017-11-18 21:25:01,374: Free energy of binding: -14.686 +- 1.131 kT ( -8.755 +- 0.675 kcal/mol) 2017-11-18 21:25:01,374: 2017-11-18 21:25:01,374: DeltaG complex : 110.935 +- 0.961 kT 2017-11-18 21:25:01,374: DeltaG restraint : 0.490 kT 2017-11-18 21:25:01,374: DeltaG solvent : 96.739 +- 0.597 kT 2017-11-18 21:25:01,374: 2017-11-18 21:25:01,375: Enthalpy of binding: -190.569 +- 78.364 kT ( -113.610 +- 46.718 kcal/mol) ]

What is different is the DeltaG solvent which makes the Free binding energy different. Any clue , or any idea what i need to look at?

My system is a protein-ligand system, prepared by AMBER-tleap. Regards, Marawan

Lnaden commented 6 years ago

A couple questions:

What version of YANK are you running?
Could you possibly include the output from YANK's Jupyter Analysis Notebook?

In many protein ligand simulations, large free energy differences usually suggest under-sampling, but the difference seems to be caused by the solvent phase more than the complex, which seems very odd unless your ligand is highly polar or something.

Do you think you could provide your input files in a tarball or zip? Thanks!

You may also find it helpful to wrap large code blocks in triple ticks ``` for better GitHub formatting!

mmagithub commented 6 years ago

Hello Landen, Thanks for the answer. I am running YANK0.17. And yes, the ligand is polar and has a full positive charge (quaternary nitrogen). We are restricted to polar ligands as we frequently apply NMR screening. And I do not think I can provide the inputs. If you think the Jupyter notebook is helpful, I can send it to you.

Regards, Marawan

jchodera commented 6 years ago

The difference in the solvent free energy between the two replicates worries me:

2017-11-19 19:08:06,603: DeltaG solvent : 106.692 +- 0.575 kT
2017-11-18 21:25:01,374: DeltaG solvent : 96.739 +- 0.597 kT

If you could share the inputs, that would be the thing we would most easily be able to replicate and track down issues with, but it sounds like you are unable to do so and we may therefore be of limited help to you because we there isn't much we can do to help you debug without the ability to reproduce the problem locally.

The Jupyter notebook shouldn't have anything proprietary in it---just plots of the convergence rate, etc.

jchodera commented 6 years ago

Oh, the number of iterations looks really low to me:

number_of_iterations: 1000
nsteps_per_iteration: 500

I'd recommend you try at least 10x the number of iterations (10000) and see if that helps. If your molecule has any significant internal torsion barriers in it, you'll need much longer simulations to overcome those barriers.

Lnaden commented 6 years ago

We are restricted to polar ligands as we frequently apply NMR screening. And I do not think I can provide the inputs.

Interesting, That could cause under-sampling in a polar solvent with the number of iterations you have.

If you think the Jupyter notebook is helpful, I can send it to you.

If you could, that would be very helpful. If there is a problem with proprietary information, the Jupyter Notebook should not contain any such information. If you are concerned still, you can check an example notebook here on GitHub. The information you see there is the only information which would be shared. I think that the most details shared are the number of atoms in the whole system.

mmagithub commented 6 years ago

Thanks All, I have prepared the IPYNB but could not plot the files correctly on my local computer. I will plot them tomorrow and send you. And I think indeed it is a sampling problem. As the systems I am currently running are different poses for the same ligand, I took a look on the DeltaG solvent values from the different output and I could indeed see a distribution of the values that is beyond the standard deviation. Here are the values I got from one cluster: ''' DeltaG solvent : 104.615 +- 0.409 kT DeltaG solvent : 107.826 +- 0.616 kT DeltaG solvent : 106.692 +- 0.575 kT DeltaG solvent : 100.852 +- 0.539 kT DeltaG solvent : 103.611 +- 0.792 kT DeltaG solvent : 103.233 +- 0.756 kT DeltaG solvent : 101.647 +- 0.500 kT DeltaG solvent : 107.138 +- 0.834 kT DeltaG solvent : 106.044 +- 1.013 kT DeltaG solvent : 106.490 +- 0.486 kT DeltaG solvent : 106.711 +- 0.747 kT DeltaG solvent : 105.233 +- 0.698 kT DeltaG solvent : 111.822 +- 0.805 kT DeltaG solvent : 107.508 +- 0.806 kT DeltaG solvent : 108.509 +- 0.759 kT DeltaG solvent : 101.984 +- 0.726 kT DeltaG solvent : 110.050 +- 0.948 kT DeltaG solvent : 106.318 +- 0.919 kT DeltaG solvent : 111.433 +- 1.085 kT ''' ############################################ I will also try to run longer and see if better convergence is achieved for the solvent phase. Not exactly sure if an average value should help in this case or not.

Regards, Marawan

jchodera commented 6 years ago

That definitely looks to be the issue. I'd run for 10-20x longer per simulation to start with.

If this seems to eliminate the DeltaG solvent discrepancies, there are ways we can speed up torsion sampling by alchemically lowering torsion barriers during the alchemical protocol.

choderalab / yank

Different results from different platforms #829