choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
181 stars 51 forks source link

Split geometry proposal into two components for more flexibility #43

Closed jchodera closed 8 years ago

jchodera commented 9 years ago

Currently, the geometry proposal x_new ~ p(new | old) (the "geometry" step below) and computation of the Metropolis-Hastings ratio p(old | new) / p(new | old) (the "ratio" step below) is done in a single step. It would be useful to split this up into two steps that can use different old/new coordinates to give us more flexibility in exploring alternative NCMC splittings.

For example, the current scheme requires we do this:

old molecule --NCMC--> old core --geometry--ratio--> new core --NCMC--> new molecule

For NCMC efficiency reasons, we may want to instead use NCMC in a dual topology alchemical scheme to transform one molecule into the other directly:

old molecule --geometry--> dual-topology old --NCMC--> dual-topology new --ratio--> new molecule

or add a separate NCMC stage to convert the old core into the new core parameters:

old molecule --NCMC--> old core --geometry--> dual-topology old core --NCMC--> dual-topology new core --ratio--> new core --NCMC--> new molecule

These could all be explored if we simply split the geometry proposal and ratio computation parts into separate steps that could use different coordinates.

pgrinaway commented 9 years ago

+1. I think the GeometryEngine should be a bit more flexible and modular either way.

pgrinaway commented 9 years ago

Actually, thinking about this a bit more, couldn't we do the above right now anyway?

For instance, in this case:

old molecule --geometry--> dual-topology old --NCMC--> dual-topology new --ratio--> new molecule

We can just run the GeometryEngine immediately to get the new coordinates and logp of the dimension-matching proposal, rather than after the NCMC elimination, right? We wouldn't calculate the dimension matching logp at the end here, right?

I still feel like it should be made more modular, but I'm not sure that we can't already do what you describe.

pgrinaway commented 9 years ago

I guess more to the point, my question would be why we would be calculating the dimension matching logp after the NCMC steps?

jchodera commented 9 years ago

In the quoted scheme, we would generate the new atom positions for the dual topology scheme first, then switch using NCMC with velocity Verlet. After that we need to compute the probability that the reverse protocol would generate the old atom positions being deleted from the post-NCMC geometry. Otherwise the probabilities of forward and reverse whole compound switching proposals will not be equal .

We therefore just need a way to do the geometry proposal with one set of coordinates and compute the Metropolis-Hastings part using two different sets of coordinates.

pgrinaway commented 9 years ago

I see what you're saying in terms of what the scheme is, though I seem to remember the "Annealed Importance Sampling Reversible Jump MCMC Algorithms" (let me catch my breath...) basically allowed the dimension jump to occur in the beginning, as you say, while allowing a straightforward method of computing the acceptance probability. I'll have to go back and read it again, though.

Nevertheless, this is a pretty minor change, so I'll do this.

jchodera commented 9 years ago

As discussed, I was slightly wrong in that the Metropolis-Hastings ratio computation is the only part that might need two configurations specified: the configuration used to propose old | new (which may occur at the end of a switching trajectory), and the configuration used to propose new | old (which may occur before the switching trajectory). In the current scheme, I believe these might just be the configurations before/after the geometry proposal.

pgrinaway commented 8 years ago

This part has already been separated.