PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Remove or INFO/AD and FORMAT/AD fields #98

Closed timothymillar closed 3 years ago

timothymillar commented 3 years ago

This is not that useful (even misleading) because because of the requirement for assignment without uncertainty and it bloats the VCF size

timothymillar commented 3 years ago

Could replace with something more useful like the posterior probability weighted count of each allele i.e. expected allele counts:

@njit(cache=True)
def dosage_posterior(posteriors, ploidy, n_alleles):
    n_genotypes = len(posteriors)
    dosage_posterior = np.zeros(n_alleles, dtype=np.float32)
    genotype = np.zeros(ploidy, np.int64)
    for i in range(n_genotypes):
        p = posteriors[i]
        for j in range(ploidy):
            dosage_posterior[genotype[j]] += p
        increment_genotype(genotype)
    return dosage_posterior
timothymillar commented 3 years ago

Any field of length "A" or "R" should be optional due to their potential to bloat the output VCF

timothymillar commented 3 years ago

Fixed in #106