Closed timothymillar closed 2 years ago
Probably better to as "posterior mean allele frequency" (PMAF) where the frequency of each allele across the entire MCMC trace is recorded. The sum of PMAF across all alleles is then 1 and the PMAF can be multiplied by ploidy if one needs the weighted allele counts. It would be easiest to implement this only for haplotype call and call-exact where the PMAF is guaranteed to sum to 1.
Best to add two options here:
See #98 for background
Consider adding an optional field of length "A" (one value for each allele) which contains the dosage of each allele weighted by posterior probability of each genotype for a sample. This would be interpreted as a floating point best estimate of allele counts in each individual and could be summed over all individuals to to produce an approximation of allele frequencies in the population.
An example of calculating this over a full posterior distribution may look like this:
But this could be calculated more efficiently directly from the trace of an MCMC.