DEploid-dev / DEploid

dEploid is designed for deconvoluting mixed genomes with unknown proportions. Traditional ‘phasing’ programs are limited to diploid organisms. Our method modifies Li and Stephen’s algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haloptype searches in a multiple infection setting.
http://deploid.readthedocs.io/en/latest/index.html
GNU General Public License v3.0
20 stars 10 forks source link

Check mcmc #357

Closed bredelings closed 3 years ago

bredelings commented 3 years ago

Can you look over this PR? I think it works fine, but you might want to put some of the code in a different place.

It does two things:

  1. Compute the likelihoods with the normalizing constant. (This is so I could check that they were correct.)

  2. Write out an MCMC trace. (This is the check how the chain is mixing.)

The trace easily allows you to see the distribution of K, and also the distribution of the weights.

The trace file can be viewed with Tracer (http://tree.bio.ed.ac.uk/software/tracer/).

shajoezhu commented 3 years ago

I recall in the very first version, we dropped the constant term. i believe the difference is coming from the constant. and it makes sense to include, as it is a more accurate measurement for the likelihood.

# on master branch, compile and ran
$ ./dEploid -vcf data/testData/PG0390-C.test.vcf -plaf data/testData/labStrains.test.PLAF.txt -o PG0390-CNopanel -noPanel -seed 1 -o master

# on check-mcmc branch, compile and ran
$ ./dEploid -vcf data/testData/PG0390-C.test.vcf -plaf data/testData/labStrains.test.PLAF.txt -o PG0390-CNopanel -noPanel -seed 1 -o check-mcmc
diff master.log check-mcmc.log 
31,33c31,33
<          Max_llks: -20383.5
<  Final_theta_llks: -20283.6
<         Mean_llks: -20283.3
---
>          Max_llks: -1212.63
>  Final_theta_llks: -1112.77
>         Mean_llks: -1112.44
35,36c35,36
<     DIC_by_Dtheta: 40566
<       DIC_by_varD: 40589.3
---
>     DIC_by_Dtheta: 2224.22
>       DIC_by_varD: 2247.59
39,40c39,40
<     Start at: Thu Apr 22 23:34:33 2021
<       End at: Thu Apr 22 23:34:36 2021
---
>     Start at: Thu Apr 22 23:33:59 2021
>       End at: Thu Apr 22 23:34:03 2021
43,46c43,46
<   Likelihood: master.llk
<  Proportions: master.prop
<   Haplotypes: master.hap
<    IBD probs: master.ibd.probs
---
>   Likelihood: check-mcmc.llk
>  Proportions: check-mcmc.prop
>   Haplotypes: check-mcmc.hap
>    IBD probs: check-mcmc.ibd.probs

All mcmc moves and llk are consistent when fixing the random seed

master = read.table("master.classic.llk")
check = read.table("check-mcmc.classic.llk")

all(master$V1==check$V1)
# TRUE

par(mar=c(5,5,5,5))
plot(master$V2, type="l", lty = 2, col = "blue", ylab = "", axes = F)
axis(1, col="black",col.axis="black")
axis(2, col="blue",col.axis="blue",las=1)
par(new=TRUE)

plot(check$V2, type="l", lty = 2, col = "red",  xlab="", ylab="", axes=F)
axis(4, col="red",col.axis="red",las=1)

tmp

shajoezhu commented 3 years ago

Many thanks @bredelings ! Very nice PR!

bredelings commented 3 years ago

Thanks!