choderalab / yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
http://getyank.org
MIT License
181 stars 71 forks source link

MC states not mixing in 0.16 #720

Closed spadavec closed 7 years ago

spadavec commented 7 years ago

For nearly the same experiment (only difference is slight change in clearance and existence of switch_distance), MC states seem to not mix in 0.16, while it seems to work in 0.15 (see attached YAML files)--i doubt these differences contribute to the overall result though. yank_mc.zip

During yank analyze in 0.15, the reported state mixing matrix is (relatively) populated (see attached JQ1_15_analyze.out), while in 0.16, it is completely empty (see attached JQ1_16_analyze.out). Furthermore, the following is also reported in 0.16:

2017-07-17 07:59:50,232: Perron eigenvalue is 0.00000; state equilibration timescale is ~ 1.0 iterations

Unsurprisingly, the dG estimate is very, very wrong in 0.16 (~ +90 kcal/mol), and relatively correct in 0.15 (~ -15 kcal/mol). Interestingly, yank analyze takesmuch longer in 0.16 (currently ~5 min vs. 20 seconds).

yank_mc.zip

andrrizzi commented 7 years ago

Hmm. That's weird! From the log JQ1_16.out it looks like the acceptance rate of the swaps is around 6-8% at each iteration

2017-07-14 19:11:50,106: Accepted 390584/5120000 attempted swaps (7.6%)

will investigate what's happening.

Lnaden commented 7 years ago

I'm seeing about 6-7% exchange between all states in the complex phase, and about 10-12% exchange in the solvent phase.

That output seems to indicate that mixing is ideal to machine precision and everything mixes well, not that its not mixing. The reason you are not seeing mixing in the transition state table is that its below the threshold to show (0.05), not because nothing is mixing, but because there are so many swaps that no one state has much occupation.

This seems very odd to me since the 0.16 version seems to indicate your complex phase deltaG is very small (backed up by the very high level of mixing).

Could you possibly run yank analyze report to create an ipython notebook, run all the cells and post/send that? You can use the command line help to see the flags (has a -s for store directory and -o for output file name). If you run the cells locally, we can actually look at what the analyze code is doing.

I'm not sure why its suddenly so much slower though. 500 iterations should not take that long.

The one thing I can see thats odd, your number of restraint states is larger than the number of electrostatics/sterics states. That should have thrown an error, but didn't, so that leads me to think something odd is going on there. @andrrizzi any insights?

andrrizzi commented 7 years ago

One thing @jchodera noticed, is that with a Boresh restraint, the free energy varies widely depending on which atoms are restrained. If you don't specify the restrained atoms, YANK will pick 6 random atoms (3 from the receptor and 3 from the ligand) that are not collinear and apply the restraint to them. You should be able to obtain the same results if you pick the same atoms. That's not possible to do in 0.15 but in 0.16 you can do something like

experiments:
  ...
  restraint:
    type: Boresch
    restrained_atoms: [0, 1, 2, 124, 125, 126]  # first three atoms indices in the receptor, last three in the ligand

Keep in mind that those are the atom indices in the final complex, so you'll have to base them on the complex.pdb file in setup/systems/your-system/.

We don't have a better way to select those atoms unfortunately at the moment. All the other parameters of the restraint are automatically determined deterministically, so you shouldn't have problems between runs (although) you can manually tune them if you want.

Unfortunately, I see that the log doesn't print out the selected atoms (I'll add this today), so the only way to reproduce the same calculation of 0.15 is to go into the netcdf file, deserialize and check the OpenMM System. I'll shortly post a script that should do the job.

andrrizzi commented 7 years ago

I'm not sure why its suddenly so much slower though.

The slowdown is probably caused by the fact that we get the temperature from reporter.read_thermodynamic_states. We still need to implement a Reporter.read_temperature(state_id) method.

Lnaden commented 7 years ago

Right. Much of that speed problem is also solved by our compression that we added recently

Lnaden commented 7 years ago

Oh, what version exactly were you using @spadavec? There was a bug in the transition state matrix calculation from 0.16.0 that I fixed in 0.16.1. That still does not explain the free energy in the complex being so off, nor does it explain the behavior with too many states.

Lnaden commented 7 years ago

xref #692

andrrizzi commented 7 years ago

seems to indicate your complex phase deltaG is very small

The first row of the free energy matrix for the complex is

0.000  -0.976   9.154  18.727  27.909  36.796  45.451  56.160  65.728  74.851  82.414  89.886  97.252 104.504 111.629 118.618 125.460 132.139 132.666 132.476 132.016 131.277 123.754 115.597 107.093  98.162  89.097  79.845  70.936  62.540  54.603  47.060  40.233  34.867  31.276  29.348  28.951  28.907  29.190  29.500  29.761   3.843

So the states are very different to one another, only it goes up and down along the path.

your number of restraint states is larger than the number of electrostatics/sterics states

Good catch! I'm tracking down in the code what happens in this case and why no error was raised.

Lnaden commented 7 years ago

So the states are very different to one another

Yes. I saw that too after the transition. But I think this is from 0.16.0 which had a critical bug in the transition state calculation. The free energy should have been okay, but the transition matrix was just wrong

andrrizzi commented 7 years ago

Ok, the extra lambda_restraints states have been truncated (on the right) so that shouldn't have been a problem.

andrrizzi commented 7 years ago

Ok, you should be able to see which atoms have been restrained in the two simulations using this print_restrained_atoms function

def read_system_0_16(complex_nc_file_path):
    from yank.repex import Reporter
    reporter = Reporter(complex_nc_file_path, open_mode='r')
    system = reporter.read_thermodynamic_states()[0][0].system
    reporter.close()
    return system

def read_system_0_15(complex_nc_file_path):
    import netCDF4 as netcdf
    from simtk import openmm
    ncfile = netcdf.Dataset(complex_nc_file_path, 'r', version='NETCDF4')
    ncgrp_stateinfo = ncfile.groups['thermodynamic_states']
    system = openmm.System()
    system.__setstate__(str(ncgrp_stateinfo.variables['base_system'][0]))
    return system

def print_restrained_atoms(complex_nc_file_path, version='0.16'):
    if version == '0.16':
        system = read_system_0_16(complex_nc_file_path)
    else:
        system = read_system_0_15(complex_nc_file_path)
    for force in system.getForces():
        if isinstance(force, openmm.CustomCompoundBondForce):
            restrained_atoms, parameters = force.getBondParameters(0)
            print('restrained_atoms', restrained_atoms)
spadavec commented 7 years ago

Could you possibly run yank analyze report to create an ipython notebook, run all the cells and post/send that?

Here is the notebook for the 0.16 run; I'm having an oddly difficult time getting the 0.15 to process properly; I'll post that when I actually figure that out.

Oh, what version exactly were you using @spadavec?

0.16.2

@andrrizzi

If you run the the code as-is, you get the following error:

TypeError: unbound method read_thermodynamic_states() must be called with Reporter instance as first argument (got nothing instead)

Is the following supposed to be changed

reporter = Reporter(complex_nc_file_path, open_mode='r')
    system = Reporter.read_thermodynamic_states()[0].system

to

reporter = Reporter(complex_nc_file_path, open_mode='r')
    system = reporter.read_thermodynamic_states()[0].system

Even still, it will complain that system is a list and doesn't have a attribute system

andrrizzi commented 7 years ago

Ah, sorry, you're right. I've updated the function.

andrrizzi commented 7 years ago

Another thing that changed recently that @Lnaden made me notice is that the default solvation model has become tip4pew rather than tip3p. If you want to reproduce the same calculation as before, you want to add to your solvent description:

solvents:
  PME:
    ...
    solvent_model: tip3p

although this should not make a big difference compared to the selection of the restrained atoms.

jchodera commented 7 years ago

Thanks for debugging this!

I think your suspicion that this is a Boresch restraint issue is likely right, but we will want to figure out what pathological choice led to this issue. We might want to inspect the solute trajectory to see what happened, so getting the feature where we write the solute only trajectory to the small energy file in would be super helpful to diagnose things like this.

andrrizzi commented 7 years ago

Can we have the analysis command print out the restrained atoms?

The restrained atoms are now logged in experiments.log, but we can do that too if necessary. The system in the analysis netcdf file anyway.

Have we checked the TIP4P-Ew FreeSolv validation set hydration energies?

Nope! Not yet.

Can we run the whole validation set on 0.16 to see if there are systematic issues?

I run the old validation set (4 hydration free energies and 4 t4lysozyme-ligand systems) on 0.16 when we released and the free energies were within statistical error of 0.15 results.

We might want to inspect the solute trajectory to see what happened, so getting the feature where we write the solute only trajectory

Just to be sure @spadavec knows this, you can already extract the solute trajectory with the command

yank analyze extract-trajectory --netcdf=path/to/complex.nc --state=0 --trajectory=output.pdb --skip=4 --nosolvent --discardequil --imagemol --verbose

The skip argument in this case makes it export only 1 frame out of 4.

jchodera commented 7 years ago

Thanks, @andrrizzi!

It might be helpful for us to look at a few solute only trajectories to see if this choice of restraint really is pathological, and how we might improve automatic selection.

spadavec commented 7 years ago

Using this command:

yank analyze extract-trajectory --netcdf=path/to/complex.nc --state=0 --trajectory=output.pdb --skip=4 --nosolvent --discardequil --imagemol --verbose

I generated this output.pdb for the 0.16 run.

In this system, the following atoms were selected by the Boresch restraint:

Atom Num (*.nc index) Atom Type Residue Type Residue Num (output.pdb index)
401 C LEU 24
518 O LYS 30
1746 HG23 ILE 104
2132 N1 LIG 127
2134 C8 LIG 127
2140 C12 LIG 127
jchodera commented 7 years ago

Thanks, @spadavec!

Here's a PyMOL ensemble with the Boresch restraint highlighted: boresch.zip

It's hard for me to see exactly what the pathology is. Some observations:

andrrizzi commented 7 years ago

Thanks @spadavec!

It would be very helpful if you could get the restrained atoms of the 0.15 simulation and run again 0.16 restraining the same atoms. This way we can determine whether the Boresch restraint is the only one at fault, or if there're also other problems introduced with the new yank version.

spadavec commented 7 years ago

@andrrizzi No problem, I should be able to set that up this afternoon after I land. I can probably run 2-3 jobs at the same time if we want to make multiple comparisons; are there any other settings you'd like for me to test?

andrrizzi commented 7 years ago

Awesome, thanks! No other settings besides the Boresch restrained atoms and solvent_model: tip3p. If you had the resources to run also a couple of replicates of the same calculation that would be even better as I suspect that after 500 iterations the statistical noise might still be relevant.

spadavec commented 7 years ago

@andrrizzi

Results of 0.16 run using the same atoms for restraint as in the 0.15 (for cases using Boresch restraint)

Run # Restraint dG kcal/mol Solvent Model
1 Boresch -15.9 tip3p
2 Boresch -17.1 tip3p
3 Boresch -17.4 tip3p
4 Harmonic -15.2 tip3p
5 Harmonic +40 tip4pew
6 Harmonic +58 tip4pew
7 Harmonic -17.9 tip3p (900 iter., no switch_distance)
8 Harmonic -13.85 tip3p (900 iter.)
9 Harmonic -15.4 tip3p (1000 iter., HBonds, no_wat)
10 Harmonic -15.7 tip3p (1000 iter., HBonds, wat)
11 Null -13.2 tip3p
12 FlatBottom -15.8 tip3p

This is all still with 500 iterations, so the noise is still high, albeit consistent with the results from 0.15. I should be able to post an example jupyter notebook later this evening.

andrrizzi commented 7 years ago

Thanks so much, @spadavec ! That's more reasonable. I'm also glad to see that the Harmonic restraint does a good job even without fixing the binding pose.

spadavec commented 7 years ago

1) Updated previous post table with new results. Runs #5 and #6 both used harmonic restraints and ran for 1000 iterations, but used the tip4pew water model. It looks like solvent_model is significantly influencing the results, so I'm running 3 new calculations that are looking into varying combinations of solvent_model and a few other settings I'm curious about. Should be able to post those results in ~2 days.

2) Attached ipython notebook here for run 3 (from previous post).

Sorry I didn't investigate these errors a little deeper before posting it here; didn't mean to drag this out!

jchodera commented 7 years ago

It's rather surprising TIP4P-Ew is producing such odd results! Are you sure you have a barostat set by specifying a pressure?

Any chance you could send a copy of the serialized complex OpenMM System XML for one of those, zipped up? @andrrizzi should be able to let your know how to extract that from the netcdf file with ncdump.

andrrizzi commented 7 years ago

Wow! Thanks for reporting.

From the YAML script @spadavec shared at the top, it looks like he does specify the pressure, so it shouldn't be the barostat. I'll run the validation calculations this week, and I'll try both tip3p and tip4pew to see if I obtain very different results with different systems.

spadavec commented 7 years ago

@andrrizzi Updated with results with 2 new runs that used tip3p; I'm basically convinced that its the default tip4pew solvent_model thats throwing things off (whether its a result of this system or the solvent model itself, im not sure). The new runs are nearly identical to run # 4, except they had ~2x number of iterations and gave similar results.

andrrizzi commented 7 years ago

Thanks @spadavec ! I'll post here the results of my calculations when they'll be over.

jchodera commented 7 years ago

It could be that OpenMM is not correctly reading leap outputs generated with tip4pew

andrrizzi commented 7 years ago

It could be that OpenMM is not correctly reading leap outputs generated with tip4pew

Could be! To extract the simulated system you can use this code snippet (sorry I missed it before)

from simtk import openmm
from yank.repex import Reporter
reporter = Reporter('complex.nc', open_mode='r')
system = reporter.read_thermodynamic_states()[0].system
with open('system.xml', 'w') as f:
    f.write(system.__getstate__())
reporter.close()
spadavec commented 7 years ago

XML data for run # 6 here serialized.zip

jchodera commented 7 years ago

Thanks, @spadavec!

A quick manual inspection of system.xml suggests the cause: There are no virtual site definitions! The virtual sites have point masses set to zero, but instead of being updated by construction, they're probably fixed in space because of the missing <VirtualSize> block. I'm surprised the binding free energies weren't even crazier!

I think we neglected to properly handle virtual sites in AbsoluteAlchemicalFactory. This is one of the dangers of creating a new System from scratch instead of using a deep copy of System as a starting point and then removing Force objects (which is now possible)---a scheme I think we should switch our design to later on.

@andrrizzi : We'll need to add virtual site support.

andrrizzi commented 7 years ago

I've made a bugfix release for openmmtools (0.12.1) that should add support for virtual sites. I'm now restarting a couple of hydration free energies to check if it solved the problem.

andrrizzi commented 7 years ago

Hydration free energies in TIP4P-EW seem to be working with the latest openmmtools now!

jchodera commented 7 years ago

Woohoo! Thanks!

spadavec commented 7 years ago

I now feel slightly less red faced and embarrassed about misdiagnosing MC mixing for virtual sites!

jchodera commented 7 years ago

There's some reference data for amino acid sidechain analogues here, and for the SAMPL2 set here.

jchodera commented 7 years ago

I now feel slightly less red faced and embarrassed about misdiagnosing MC mixing for virtual sites!

Considering the pathology was likely that all the water molecules were literally pinned in place by immovable virtual sites, the embarrassment is clearly on our end!

andrrizzi commented 7 years ago

the embarrassment is clearly on our end!

Considerably so :sweat_smile:

jchodera commented 7 years ago

I think we've resolved this in 0.12.1. Please reopen if not!