Closed biopitch closed 3 years ago
From that log file, it appears that everything ran to completion, so I don't know why the heatmap is empty. It could be an issue with one of the R packages used for plotting, but that's just a guess. Does the PDF raise an error when you open it? Is the filesize empty?
The heatmap is only intended as a visual representation of the Bayes factor analysis, and the raw data is available in the bayes/test1-likelihoods.csv
file. It's also recommended to inspect each of the MCMC outputs, but most importantly the Gelman-Rubin metric, which should not exceed 1.2 for any of the models.
Thanks for getting back to me so quickly.
The pdf opens with a notice that says 'The document contains no pages', the file size is 3.6kB.
Regarding the csv, I have something that looks like this: "","mean","sd" "e4497f6d8ac1",-266.332116919005,1.91874192054904 . . . "ff40788164f7",-298.824032288417,3.09625781106755
Does that mean 'ff40788164f7' is the best fit graph? But when I look at the Gelman-Rubin metric, many of the models are way above 1.2. Does that mean the results are not conclusive?
From the CSV, the model with the mean likelihood closest to 0 (i.e. the topmost graph) is the best supported.
However, there are two major caveats to consider:
The Bayes factors are only valid if the Gelman-Rubin metrics are below 1.2. The plot you attached shows the relationship between the Gelman-Rubin metric and a longer potential burn-in of the model (the point estimates of this metric are at the end of the log file for each model, and this is what you should use to evaluate each model). What your PDF indicates is that for that particular model, the MCMC chain length and burn-in was insufficient to reach the standing distribution of the model for some parameters. The best way to deal with this is by increasing these values using the --iterations
and --burnin
flags. If you keep the same output folder then qpBayes
will resume where it left off from the existing chains, to save having to run them again. To make an appropriate choice, look at all the *-burn-gelman.pdf
plots to decide how far you need to burn-in the existing chains (to get the Gelman-Rubin metrics low enough), and then extend the total chain length by the same quantity.
Once all your models have suitable Gelman-Rubin metrics, you can then use the CSV to determine which was the best model. However, you cannot reject one model in favour of another unless the difference between their log-likelihood scores is above a certain threshold (see https://en.wikipedia.org/wiki/Bayes_factor#Interpretation)
Many thanks for the detailed explanation. Please let me just clarify some things:
Thanks Evan for the clarification.
I am currently trying to run:
qpBayes \
--geno test.geno \
--ind test.ind \
--snp test.snp \
--prefix test \
--pops M St B J K \
--out Out \
--iterations 500000 \
--burnin 100000 \
--threads 12
The gelman-rubin graphs didn't change/get replaced, resulting in the same likelihood calculations. Do I first delete the existing files in the bayes folder?
Both your --iterations
and --burnin
parameters are well below the defaults, in light of which, I am unsurprised that your models have not fully converged. Running a model to completion with the default parameters does take quite a while, however, the defaults were chosen specifically to avoid situations like this.
Producing the burn-gelman.pdf
plots (and other diagnostics) also takes a long time, so by defualt, they are not regenerated if the files already exist. To remake them, delete all the output in the bayes/
folder except the chains files themselves.
Thanks Evan, I will try that.
FYI, I just release a new version of qpBayes
with a few improvements to the way the Bayes factors are calculated (and a bug fix for the empty heatmap issue that you reported).
To see the benefit you will need reinstall qpBrute
, and then delete the contents of both the dstats/
and bayes/
folders and run qpBayes
again.
pip install --upgrade --no-deps --force-reinstall git+git://github.com/ekirving/qpbrute.git
The new version of the {prefix}-likelihoods.csv
file contains the multivariate version of the Gelman-Rubin metric (i.e., the MPSRF) and {prefix}-bayes-factors.csv
contains the pairwide K values for the Bayes factors.
Dear Evan,
I am running qpbayes now on 15 qpbrute graphs. It seem to have run to completion but I get an empty heatmap pdf file. I am wondering if the script have ran properly and how to interpret the results.
I have attached my logs. qpbayes.log
Any advice is greatly appreciated.