Closed maxbiostat closed 7 months ago
Any luck you there? I have the same problem here. It seems that the data structure is the problem. Because with the same environment and parameters, with another slightly different dataset, the problem dissappeared. I only have the problem with this specific dataset. I'm using Beast 1.10.1.
@carolynzy , there's a temporary workaround. Go to the XML and replace
<gmrfGridBlockUpdateOperator scaleFactor="1.0" weight="2">
<gmrfSkyrideLikelihood idref="skygrid"/>
</gmrfGridBlockUpdateOperator>
<scaleOperator scaleFactor="0.75" weight="1">
<parameter idref="skygrid.precision"/>
</scaleOperator>
with
<adaptableVarianceMultivariateNormalOperator weight="3.0" scaleFactor="1.0" initial="1200" burnin="600" beta="0.05" coefficient="1.0" autoOptimize="true" formXtXInverse="false">
<transform type="none">
<parameter idref="skygrid.logPopSize"/>
</transform>
<transform type="log">
<parameter idref="skygrid.precision"/>
</transform>
</adaptableVarianceMultivariateNormalOperator>
@maxbiostat Thanks for the hint! I have just tried to work around with the priors. So it seems the priors I used were not optimized. I changed the "alpha" and "treeModel rootHeight" based on experience. , and changed "gmrfGibbsOperator" and "skygrid.precision". After doing this, I ran through a test of 1 mil iterations without stuck. I guess the trick is to find the best priors. Because if I just change the operators it didn't work. But not 100% sure. Have you tried this?
@carolynzy, If you just do the change I suggested above, do you still get the "getting stuck" problem?
@maxbiostat the problem solved after changing the priors.
@carolynzy Changing the priors should be done only for statistical reasons, never for computational ones. I suggest you change only the operators and see if solves the problem.
@maxbiostat I have a question, is this problem a computational one? Or it might a statistical one. Because I think this problem is caused by the constante rejection of the sampled probability distribution. Am I right? So, it could be the prior distribution doesn't fit the real data. Is that possible? I'm naive to this, please correct me if I'm wrong. I will try your suggestion and get back later.
@carolynzy , it is most certainly a computational problem. The default priors, while not perfect, are designed to be reasonable for the vast majority of applications.
I agree with @maxbiostat and do not support this kind of prior tweaking. When using the adaptive operator, I would suggest not to include the precision parameter and keep using a scaleOperator to estimate it.
Did anyone ever work out exactly why the block operator experiences this behaviour? While obviously adjusting the priors shouldn't be done to prevent it, it seems potentially instructive if the priors determine whether it happens.
Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.
@maxbiostat Your suggestion improved the stuck problem. Now it could run through millions of interations withou stuck. But the coverge takes really a long time. It's still running at the moment. However, I don't think it's inapproriate to change the priors, as you can see from this tutorial, https://github.com/taming-the-beast/Prior-selection So if I'm confident in the prior distribution, I think select a proper prior could improve the mcmc process.
Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.
To be honest I'm too far removed from the problem these days to be of much help - I did wonder if a fix (apart from using a simpler move) was close though.
Hi, @maxbiostat , while I'm still running on your suggestions, I found something strange. I uploaded the log file here, mdr_filterinv_testMaxbiostat_skygrid_cutoff70_2.72e-5.log When opening it in Trace, you could notice the posterior and likelihood trace plot were cut in the middle, one has smaller average and the other has bigger average. And the ESS for these two parameters were only 4, while others have reached dozens to hundreds. What could be the problem here? Is this normal?
@carolynzy ,
First, here is a picture of the trace plot for context to others:
I don't know exactly what the problem is and it's hard to tell without the XML. What I can say is that the Markov chain has most likely jumped to a new mode. It is certainly not normal, in the sense that ideally BEAST should be able to quickly traverse tree space and reach higher modes. In real life, however, this doesn't always happen. What I can recommend is that you run multiple independent chains and observe whether the chains ultimately reach a particular mode. If they do, you can set the appropriate burn-in and use the samples only from that region.
Same problem here with a small alignment and dimension = 50 and the solution https://github.com/beast-dev/beast-mcmc/issues/1030#issuecomment-497359850 above worked, the skygrid.logpopsize1 (beast 1.10.4 and 1.10.5) is now updated by the MCMC whereas it was staying constant with the beauty generated xml.
What about using Hamiltonian Monte Carlo? Does the problem still persist? https://wellcomeopenresearch.org/articles/5-53
Need to pick a solution for the next release. Either adaptableVarianceMultivariateNormalOperator
or Hamiltonian.
Most certainly Hamiltonian for the skygrid.
Hamiltonian MC is implemented in BEAUTi so closing this issue
If one runs this XML with
-seed 1
(or any other seed for that matter) thelogPopSizes
will get stuck pretty much instantly. Have tried with more gridpoints, makes no difference, so I chose to use fewer to make it easier to debug. I also have recompiled BEAST withFAIL_SILENTLY = false
inGMRFSkyrideBlockUpdateOperator
but the problem doesn't seem to be the well-known Newton-Raphson problem.If anyone has any insight I'd love to hear it. Better yet if you can find something obvious and stupid I've messed up in the XML. This is simulated data on a simulated tree (constant
Ne = 1
) using dates from a real data set. Another data set with a different tree and the same dates gives the same "getting stuck" problem.