Long-standing store/restore issue ... or I am a fool chasing a ghost.

beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees

http://beast.community

GNU Lesser General Public License v2.1

192 stars 73 forks source link

Long-standing store/restore issue ... or I am a fool chasing a ghost. #985

Closed msuchard closed 6 years ago

msuchard commented 6 years ago

Am in need of some help (@rambaut @maxbiostat @xji3) for my sanity.

Checkout new branch store_restore_bug
Run examples/Benchmarks/benchmark1.xml with the usual seed
Notice that BEAST immediately dies because a
```
score = evaluate(likelihood);
likelihood.makeDirty();
newScore = evaluate(likelihood);
```
in MarkovChain appears to expose a latent store/restore problem. To make matters more confusing, turning on fullEvaluation makes the "error" (?) go away until the fullEvaluation period is over.

Maybe I am just chasing a ghost here, but I'm pretty confused at the moment.

GuyBaele commented 6 years ago

These sorts of situations are more common than we'd like them to be (you're not chasing a ghost), we have encountered examples like this before. Most common are flatlining scenarios for the transition probabilities between discrete location states and the logPopSizes of the Bayesian skygrid. Making use of fullEvaluation has allowed to run those XML files in the past, but would be good to get to the bottom of things. Will join the party to try and fix this.

GuyBaele commented 6 years ago

Basically, and as you've already indicated, adjusting the code in Markovchain to the following fixes the problem without using fullEvaluation but I haven't figured out yet why.

mcmcOperator.reject();
currentModel.restoreModelState();
likelihood.makeDirty();

msuchard commented 6 years ago

This suggests that *.restore() is not functioning properly somewhere.

rambaut commented 6 years ago

The diagnostic report should show the before and after of each component of the density.

On 5 Apr 2018, at 14:07, Marc Suchard notifications@github.com wrote:

This suggests that *.restore() is not functioning properly somewhere.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beast-dev/beast-mcmc/issues/985#issuecomment-378929451, or mute the thread https://github.com/notifications/unsubscribe-auth/AAp5RbuXm6OZ6b5sPYmWlW_kJXf0GKOEks5tlhcVgaJpZM4THpdV.

msuchard commented 6 years ago

Oh ... it's the TreeLikelihood alright. And to simplify the problem, it continues to occur with just the uniformNodeHeightOperator in play.

msuchard commented 6 years ago

The amazing @xji3 appears to have found and fixed this bug (repo hmc-clock). It lies in BufferIndexHelper (and tree-likelihoods that predate this class into history forgotten) and occurs when when there are > 1 likelihood evaluations between a call to store() and restore(). Each evaluation previously caused a buffer flip, such that we ended up overwriting the stored values on the second evaluation.

In BeagleTreeLikelihood and TreeDataLikelihood we were protecting against this situation when re-computing because of underflow only (not a second real evaluation). But @xji3's solution is more general and we can probably remove the extra logic when handling the second underflow evaluation now.

GuyBaele commented 6 years ago

That is amazing indeed, great work!

rambaut commented 6 years ago

The logic could be to set a flag (flipIndices = true) in the storeState method, then unset it after the flipping (and don't flip if the flag is false).

msuchard commented 6 years ago

Yep. Already fixed.

xji3 commented 6 years ago

I want to add that the solution came up from a discussion between Marc and me. Although Newton's law doesn't get citations, giant's shoulder should be acknowledged.

rambaut commented 6 years ago

Ported @xji3's general fix to master branch. Closing.