Skygrid gets stuck - Githubissues

maxbiostat commented 5 years ago

If one runs this XML with -seed 1 (or any other seed for that matter) the logPopSizes will get stuck pretty much instantly. Have tried with more gridpoints, makes no difference, so I chose to use fewer to make it easier to debug. I also have recompiled BEAST with FAIL_SILENTLY = false in GMRFSkyrideBlockUpdateOperator but the problem doesn't seem to be the well-known Newton-Raphson problem.

If anyone has any insight I'd love to hear it. Better yet if you can find something obvious and stupid I've messed up in the XML. This is simulated data on a simulated tree (constant Ne = 1) using dates from a real data set. Another data set with a different tree and the same dates gives the same "getting stuck" problem.

carolynzy commented 5 years ago

Any luck you there? I have the same problem here. It seems that the data structure is the problem. Because with the same environment and parameters, with another slightly different dataset, the problem dissappeared. I only have the problem with this specific dataset. I'm using Beast 1.10.1.

maxbiostat commented 5 years ago

@carolynzy , there's a temporary workaround. Go to the XML and replace

        <gmrfGridBlockUpdateOperator scaleFactor="1.0" weight="2">
            <gmrfSkyrideLikelihood idref="skygrid"/>
        </gmrfGridBlockUpdateOperator>
        <scaleOperator scaleFactor="0.75" weight="1">
            <parameter idref="skygrid.precision"/>
        </scaleOperator>

with

        <adaptableVarianceMultivariateNormalOperator weight="3.0" scaleFactor="1.0" initial="1200" burnin="600" beta="0.05" coefficient="1.0" autoOptimize="true" formXtXInverse="false">
            <transform type="none">
                <parameter idref="skygrid.logPopSize"/>
            </transform>
            <transform type="log">
                <parameter idref="skygrid.precision"/>
            </transform>
        </adaptableVarianceMultivariateNormalOperator>

carolynzy commented 5 years ago

@maxbiostat Thanks for the hint! I have just tried to work around with the priors. So it seems the priors I used were not optimized. I changed the "alpha" and "treeModel rootHeight" based on experience. , and changed "gmrfGibbsOperator" and "skygrid.precision". After doing this, I ran through a test of 1 mil iterations without stuck. I guess the trick is to find the best priors. Because if I just change the operators it didn't work. But not 100% sure. Have you tried this?

maxbiostat commented 5 years ago

@carolynzy, If you just do the change I suggested above, do you still get the "getting stuck" problem?

carolynzy commented 5 years ago

@maxbiostat the problem solved after changing the priors.

maxbiostat commented 5 years ago

@carolynzy Changing the priors should be done only for statistical reasons, never for computational ones. I suggest you change only the operators and see if solves the problem.

carolynzy commented 5 years ago

@maxbiostat I have a question, is this problem a computational one? Or it might a statistical one. Because I think this problem is caused by the constante rejection of the sampled probability distribution. Am I right? So, it could be the prior distribution doesn't fit the real data. Is that possible? I'm naive to this, please correct me if I'm wrong. I will try your suggestion and get back later.

maxbiostat commented 5 years ago

@carolynzy , it is most certainly a computational problem. The default priors, while not perfect, are designed to be reasonable for the vast majority of applications.

GuyBaele commented 5 years ago

I agree with @maxbiostat and do not support this kind of prior tweaking. When using the adaptive operator, I would suggest not to include the precision parameter and keep using a scaleOperator to estimate it.

mdhall272 commented 5 years ago

Did anyone ever work out exactly why the block operator experiences this behaviour? While obviously adjusting the priors shouldn't be done to prevent it, it seems potentially instructive if the priors determine whether it happens.

maxbiostat commented 5 years ago

Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.

carolynzy commented 5 years ago

@maxbiostat Your suggestion improved the stuck problem. Now it could run through millions of interations withou stuck. But the coverge takes really a long time. It's still running at the moment. However, I don't think it's inapproriate to change the priors, as you can see from this tutorial, https://github.com/taming-the-beast/Prior-selection So if I'm confident in the prior distribution, I think select a proper prior could improve the mcmc process.

mdhall272 commented 5 years ago

Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.

To be honest I'm too far removed from the problem these days to be of much help - I did wonder if a fix (apart from using a simpler move) was close though.

carolynzy commented 5 years ago

Hi, @maxbiostat , while I'm still running on your suggestions, I found something strange. I uploaded the log file here, mdr_filterinv_testMaxbiostat_skygrid_cutoff70_2.72e-5.log When opening it in Trace, you could notice the posterior and likelihood trace plot were cut in the middle, one has smaller average and the other has bigger average. And the ESS for these two parameters were only 4, while others have reached dozens to hundreds. What could be the problem here? Is this normal?

maxbiostat commented 5 years ago

@carolynzy ,

First, here is a picture of the trace plot for context to others:

I don't know exactly what the problem is and it's hard to tell without the XML. What I can say is that the Markov chain has most likely jumped to a new mode. It is certainly not normal, in the sense that ideally BEAST should be able to quickly traverse tree space and reach higher modes. In real life, however, this doesn't always happen. What I can recommend is that you run multiple independent chains and observe whether the chains ultimately reach a particular mode. If they do, you can set the appropriate burn-in and use the samples only from that region.

babarlelephant commented 2 years ago

Same problem here with a small alignment and dimension = 50 and the solution https://github.com/beast-dev/beast-mcmc/issues/1030#issuecomment-497359850 above worked, the skygrid.logpopsize1 (beast 1.10.4 and 1.10.5) is now updated by the MCMC whereas it was staying constant with the beauty generated xml.

GuyBaele commented 2 years ago

What about using Hamiltonian Monte Carlo? Does the problem still persist? https://wellcomeopenresearch.org/articles/5-53

rambaut commented 1 year ago

Need to pick a solution for the next release. Either adaptableVarianceMultivariateNormalOperator or Hamiltonian.

GuyBaele commented 1 year ago

Most certainly Hamiltonian for the skygrid.

rambaut commented 4 months ago

Hamiltonian MC is implemented in BEAUTi so closing this issue

beast-dev / beast-mcmc

Skygrid gets stuck #1030