Open biozzq opened 5 years ago
Do you have multiple individuals?
Dear @ANGSD
Yes, I have multiple individuals for ancient samples and I ran ANGSD using bam list including all samples.
Best regards, Zhuqing
Dear @ANGSD,
I hope you can take some time to help me clear this problem. Thank you very much.
Kind regards, Zhuqing
would it be posterior probability of not being a heterozygote?.. as in, -doGeno 8 and simply sum up probabilities of two homozygous? (sounds too easy, apparently i'm missing something)
On Oct 11, 2019, at 4:27 AM, biozzq notifications@github.com wrote:
Dear @ANGSD,
I hope you can take some time to help me clear this problem. Thank you very much.
Kind regards, Zhuqing
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
It is better to incorporate population-level information. The two ways that ANGSD can do this are both Empirical Bayes (= using priors estimated from the entire data). First, via estimated frequency of that particular allele, which effectively shares information across samples at a given site. Second, via the estimated site frequency spectrum, effectively sharing information across multiple samples and sites. You can get more details in the Nielsen https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0037558 et al 2012 PLoS paper https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0037558
The other accurate methods of calling genotypes that I know of (e.g. Maruki & Lynch 2017 https://www.ncbi.nlm.nih.gov/pubmed/28108551) also incorporate population-level information. I repeat, probably not a good idea to use an uninformative prior on individual genotypes with low-depth data, even if you are going to be working downstream with genotype probs and not hard calls.
Nate
Or, perhaps you mean the probability that a single copy drawn at random from the sample carries the derived allele? ... that is Pr(homozygous for derived allele) + 0.5 Pr(heterozygous)
Dear all,
Now, to trace the allelic trajectories in ancient genomes, we usually want to statistic the precise genotypes of some loci which played a more important role in adaption evolution. However, as the low sequencing depth of ancient samples and the unknow causal variants in these loci, it will be difficult for me to give a precise genotype. Through the published literatures, I found that the probability that one sample carries one allele has been used during their analysis. I want to imitate them; however, some problem occurred to me. Hope you can give me some suggestions.
1, genotype likelihood (GL) or genotype probabilities (GP), which one could be the best estimation? From the VCF generated using ANGSD, I found the GL can be more consist with the alignment of the reads, such as following:
DP:AD:GP:GL 2:0,2:0.000000,0.675811,0.324189:-7.799987,-0.602068,0.000000
the GP field suggests 0/1 as the most likely genotype. The GL filed shows 1/1 is the most probable (Only two reads support the "1" allele, so GL can be more accurate). Such cases seem to occur with really low read depths and genotype qualities. The command used to generate the VCF like followingangsd -b $bam -remove_bads 1 -uniqueOnly 1 -ref reference.fa -out $out -doMaf 1 -skipTriallelic 1 -doMajorMinor 1 -GL 1 -doCounts 1 -P 4 -doGlf 2 -minQ 20 -minMapQ 25 -minMaf 0.05 -doVcf 1 -doPost 1
2, As the GLs are computed for each different genotype (0/0, 0/1, 1/1), if I want to know the probability of one allele (
0
or1
) at this site, which is the best way? Can I use the GL of0/0
plus the half of0/1
?kind regards, Zhuqing