PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Posterior mean dosage 'DS' #136

Open timothymillar opened 2 years ago

timothymillar commented 2 years ago

Related to #103

GWASpoly uses the DS field for posterior mean dosage. This is essentially posterior mean allele frequency * ploidy. However, the (informal standard) DS field doesn't report values for the reference allele. This is fine when reported for the full posterior distribution because the dose of the reference allele can be imputed as ploidy - sum(alts). But this is an issue for mchap assemble because we don't necessarily report all alleles (excluded infrequent alleles) resulting in a concatenated posterior distribution. This means the reference allele dosage can't be imputed. More importantly, the dose of the alternate alleles can't be normalized without the reference allele value. Use of the results from mchap assemble without normalization may bias downstream analysis.

There are a few options:

The first option is probably the best option