choderalab / openmmtools

A batteries-included toolkit for the GPU-accelerated OpenMM molecular simulation engine.
http://openmmtools.readthedocs.io
MIT License
250 stars 77 forks source link

Differences in work distributions using alchemy and openmmtools.alchemy #247

Open sgill2 opened 7 years ago

sgill2 commented 7 years ago

I've been observing very different protocol_work distributions when swapping between the old standalone alchemy package and the newer openmmtools.alchemy.

To illustrate the difference, I setup some example simulations of T4 lysozyme and toluene, alchemically perturbing the toluene with the AlchemicalNonequilibriumLangevinIntegrator over 100 steps. The sterics/electrostatics were turned off, then back on in a symmetric protocol over those 100 steps, and I got the protocol work (and the alchemical correction factor) for this process. This was done for 1000 iterations, with MD run in-between each nonequilibrium switching iteration to vary the starting configuration. I ran this with both the alchemy and openmmtools.alchemy, the results of which are plotted in the histogram below. The example scripts I used to run this can be found in this zip file alch_example.zip

screen shot 2017-07-05 at 4 41 35 pm (alchemy is on the top, openmmtools.alchemy is the bottom) I'm also including the same histogram data above, but cutting off the extreme points of the graph just so that the binning is better separated. screen shot 2017-07-05 at 4 39 33 pm

Using the different alchemy packages results in protocol work distributions that are quite different from one another. Also, the magnitudes of the alchemical correction factor widely vary between the two (but when added to the protocol work don't seem to drastically affect the work distribution).

The only major change to the core code of openmmtools.alchemy I noticed in the commit/release history was that the newer version adds a switching function, but changing the switch_width parameter didn't account for the differences in work distributions.

sgill2 commented 7 years ago

Tagging @davidlmobley

davidlmobley commented 7 years ago

Just to further elaborate on this -- what we found was that recently, we've been getting much worse acceptance in BLUES with openmmtools.alchemy than we were with the standalone alchemy. There had been various code changes on our end as well, so Sam had to dig in but finally found that the difference is driven by a dramatic difference in protocol work distributions in the two cases. We are highly suspicious there may be a bug, as the earlier behavior (that is, the behavior of alchemy) was consistent with what we expect, and the openmmtools.alchemy behavior is not.

What we were seeing originally was that we could run a simulation of toluene in lysozyme and propose standard MC moves that randomly rotated the ligand in the binding site, and these would sometimes (but rarely) get accepted. On switching to NCMC using alchemy, we got dramatically higher acceptance even with a modest amount of relaxation, as one might expect. But on switching to openmmtools.alchemy, acceptance gets dramatically worse again.

I believe Sam's test case looks at toluene in lysozyme without any ligand rotation at all, and just looks at the protocol work distribution for decoupling and recoupling lysozyme in the binding site. What we naively expect for this looks like the top left plot -- that is, since there is no deliberate reorientation of the ligand going on, we expect it to be roughly equally likely for the work for this process to be unfavorable and favorable, and it probably should be centered roughly around zero. This is what we see with alchemy. But with openmmtools.alchemy we get a distribution which is strongly skewed towards positive work values, with very few negative work values -- meaning that if we were using this to try and whether to accept or reject these "moves", most would be rejected. We think this is the origin of our current problems with very low acceptance.

@jchodera @Lnaden @andrrizzi , any insights into what might be causing this?

jchodera commented 7 years ago

We are highly suspicious there may be a bug, as the earlier behavior (that is, the behavior of alchemy) was consistent with what we expect, and the openmmtools.alchemy behavior is not.

I think we should really focus on the work distributions and save the alchemical correction issue for once we've sorted the work distributions out.

We have to go back and see what changes were made between alchemy and openmmtools.alchemy`, but I suspect that either (1) bugs have been fixed (the worst case), or (2) we made some changes that altered the default alchemical protocol (best case).

Naively, from looking at your work distributions, the original alchemy work distributions look very, very unphysical. There is no way the work distribution for inserting a molecule after 100 steps of dynamics would be large, symmetric, and have fluctuations that indicate the system is doing 200 kT of work on you for free. That just doesn't happen in molecular systems. I'm unsure of what kind of bug could have led to that behavior, but it does not match any kind of physical tuition I have about microscopic systems.

On the other hand, the newer openmmtools.alchemy work distributions look like I would expect: Highly asymmetric, long tails toward high (unfavorable) work values, and barely any density on the favorable side where the system is doing work on you. This is how I expect real physical systems to behave.

Let's dig into this some more, but I fear that you were seeing an artifact of something that led to "too good to be true" acceptance ratios for some reason we don't yet understand.

jchodera commented 7 years ago

Maybe we could also simplify the system and look at just a ligand in a box of water and have a simple test example we can share? Maybe we can even look at one phase of this and just insert the ligand into water from a decoupled state?

davidlmobley commented 7 years ago

@jchodera -- just to clarify, this test case is for turning toluene OFF in the binding site AND turning it back on. That is, it's a round trip. So it should be roughly symmetric, we believe. More in a moment.

jchodera commented 7 years ago

@jchodera -- just to clarify, this test case is for turning toluene OFF in the binding site AND turning it back on. That is, it's a round trip. So it should be roughly symmetric, we believe. More in a moment.

Even if the protocol is symmetric, the work distribution will not be. Try, for example, just discharging an ion and re-charging it. Or disappearing a sphere and re-inserting it. The free energy difference will be zero if you do this in water, but the work distribution will be asymmetric, shifted toward positive average work, and only becomes a delta function around zero in the limit of nsteps = 0 or infinity.

davidlmobley commented 7 years ago

I think we should really focus on the work distributions and save the alchemical correction issue for once we've sorted the work distributions out.

Yes, agreed. That's why we're not really focusing on those plots in this thread.

We have to go back and see what changes were made between alchemy and openmmtools.alchemy`, but I suspect that either (1) bugs have been fixed (the worst case), or (2) we made some changes that altered the default alchemical protocol (best case).

Yes. Sam spent a good deal of time on that already and tried a variety of things but was not able to figure out why things are so different.

Naively, from looking at your work distributions, the original alchemy work distributions look very, very unphysical. There is no way the work distribution for inserting a molecule after 100 steps of dynamics would be large, symmetric, and have fluctuations that indicate the system is doing 200 kT of work on you for free. That just doesn't happen in molecular systems. I'm unsure of what kind of bug could have led to that behavior, but it does not match any kind of physical tuition I have about microscopic systems.

Apologies that the test was not explained clearly enough. We start with toluene fully coupled in the binding site, alchemically decouple it, then alchemically recouple it. We think this should be somewhat symmetric in the limit of a small amount of relaxation since the protocol is symmetric and we are not doing any kind of rearrangement while it is decoupled.

On the other hand, the newer openmmtools.alchemy work distributions look like I would expect: Highly asymmetric, long tails toward high (unfavorable) work values, and barely any density on the favorable side where the system is doing work on you. This is how I expect real physical systems to behave.

Again, this is not what we expect because we're just turning it off and back on in the same place. If you imagine doing this instantaneously rather than running some relaxation, you should do no work in the process and have no work done in the process (if turning the interactions off involves doing X work, then turning them back on will do X work). Any relaxation will, as you note, result in dissipation.

Even if the protocol is symmetric, the work distribution will not be. Try, for example, just discharging an ion and re-charging it. Or disappearing a sphere and re-inserting it. The free energy difference will be zero if you do this in water, but the work distribution will be asymmetric, shifted toward positive average work, and only becomes a delta function around zero in the limit of nsteps = 0 or infinity.

Yes, agreed that the more relaxation you have the more dissipation you will have and the more asymmetric this will get.

I suppose it's possible the new case is more correct, but it seems important to sort out why things are so dramatically different in the old alchemy, then.

sgill2 commented 7 years ago

Maybe we could also simplify the system and look at just a ligand in a box of water and have a simple test example we can share? Maybe we can even look at one phase of this and just insert the ligand into water from a decoupled state?

I have a toluene in an explicit water box lying around somewhere; I can run a similar test to what I did above with that, and in addition I could do just the 'decoupling off' portion of the perturbation if that would be insightful.

jchodera commented 7 years ago

I have a toluene in an explicit water box lying around somewhere; I can run a similar test to what I did above with that, and in addition I could do just the 'decoupling off' portion of the perturbation if that would be insightful.

Do you think disappearing/reappearing a water in a water box would be sufficient to demonstrate the issue, or does it need to be something larger like toluene? If a water would work, we can quickly code up a minimal test case using openmmtools.testsystems.WaterBox. If something the size of toluene is important, I can add a TolueneExplicit testsystem we can play with.

jchodera commented 7 years ago

Yes. Sam spent a good deal of time on that already and tried a variety of things but was not able to figure out why things are so different.

Here's something we can quickly do to inspect differences: Can you take just your small molecule (if possible; if not, the small molecule in water is fine) and serialize out XML files from both the alchemy generated and openmmtools.alchemy generated alchemical Systems? You can do this with

from simtk import openmm

def serialize_system(system, filename):
    serialized_system = openmm.XmlSerializer.serialize(system)
    outfile = open(filename, 'w')
    outfile.write(serialized_system)
    outfile.close()

serialize_system(alchemical_system_from_alchemy, 'alchemy-system.xml')
serialize_system(alchemical_system_from_openmmtools_alchemy, 'openmmtools-system.xml')

We can inspect the XML files and see if anything is oddly different.

sgill2 commented 7 years ago

Do you think disappearing/reappearing a water in a water box would be sufficient to demonstrate the issue, or does it need to be something larger like toluene? If a water would work, we can quickly code up a minimal test case using openmmtools.testsystems.WaterBox. If something the size of toluene is important, I can add a TolueneExplicit testsystem we can play with.

Testing the old alchemy does seem to indicate the issue can be replicated by transforming a water.

We can inspect the XML files and see if anything is oddly different.

Here's the serialized xml files for just toluene toluene-system.zip

jchodera commented 7 years ago

@sgill2: Awesome, thanks!

I think I spotted the biggest problem: The alchemy version you are using appears to generate an alchemical system that uses fused sterics and electrostatics by setting softcore_beta to 1, while the new openmmtools.alchemy sets softcore_beta to 0 to disable fused sterics and electrostatics.

There are a bunch of other differences too, but this would have a huge impact on the alchemical path.

Can you try explicitly setting softcore_beta in both codes to whatever you're intending and see if you get something more similar?

sgill2 commented 7 years ago

Can you try explicitly setting softcore_beta in both codes to whatever you're intending and see if you get something more similar?

@jchodera Oh, I think I found the cause of the issue. I thought I was using the latest alchemy version (1.2.3, which has softcore_beta=0), but actually I was using 1.2.0. Once I updated to the latest alchemy then I see the protocol work behaves similarly between alchemy packages. More specifically, using alchemy version 1.2.2 and up doesn't exhibit the prior behavior.

jchodera commented 7 years ago

Great!

I think this means you need to use a nonzero softcore_beta to reenable fused softcore, but we haven't had luck using the fused protocol, so you'll want to be careful.

jchodera commented 7 years ago

@sgill2 : If you can collect work distributions for different switching times for softcore_beta set to 0 or 1, it will give us a good idea of what might be going on with those work distributions. I'd recommend using the latest openmmtools.alchemy which should have all the latest bugfixes in it.

I'd still be really surprised if those work distribution really have very negative work values where the system is doing a lot of "free" work. The alchemical_correction_factor should also be relatively small in magnitude unless we're somehow hugely screwing up the PME in the fully-coupled lambda=1 state, which shouldn't be happening.