ekirving / qpbrute

Heuristic search algorithm for fitting qpGraph models
MIT License
9 stars 3 forks source link

outpop settings and convergence diagnostic #12

Closed mariels closed 3 years ago

mariels commented 3 years ago

Hi Evan,

I have been running qpbrute on my dataset, I have around 10 individuals for either four or six populations. I only have one individual for the outgroup, which is the sister species in the same genus.

I have been using outpop: NULL as with one individual it would not be possible to reliably estimate allele frequencies of this outgroup. I tried to set it to outpop: POPNAME and as expected all graphs failed. When I included outpop: NONE, with SNPs weighted according to the allele frequencies in the whole dataset, one of the two best graphs is different and there is much more drift along the branches.

I would think using outpop: NULL would be better but I am not sure. Would you have any advices?

I have another question about the Gelman Rubin convergence diagnostic. Model 1 has a non significant better fit than model 2, with a Bayes factor of 0.88. But one of the edge of model 1 has a Gelman Rubin point estimate of 1.28 and upper CI of 2.25. All the other values are fine. Would that be enough to select model 2 over model 1?

Many thanks,

Marie

ekirving commented 3 years ago

Hi Marie,

I personally would use outpop: NULL in this situation, but this is really a question about the inner workings of qpGraph and I have never systematically tested its behaviour using outpop: NONE.

Regarding the Gelman-Rubin convergence diagnostic, if you have a point estimate above 1.2 this indicates that the MCMC chain did not reach the standing distribution for this parameter in the model. This cannot be used to reject the model itself, and instead it indicates that the resulting Bayes factor may not be reliable, and that the MCMC chain should be extended.

I would recommend extending the chain length and burn-in for all models, in the hope that the score will drop below 1.2. This can be done using the --iterations and --burnin flags. If you keep the same output folder then qpBayes will resume where it left off from the existing chains, to save having to run them again.

To make an appropriate choice, I would advise looking at the *-burn-gelman.pdf plot for the problematic model as this may be useful in deciding how far you need to burn-in the existing chain, and then extend the total chain length by the same quantity.

Cheers, Evan

mariels commented 3 years ago

Hi Evan,

Many thanks for your fast reply and help. I will increase the chain length and burnin.

Cheers,

Marie