PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Inclusion of alt alleles for recording posterior distributions in VCF output #93

Closed timothymillar closed 3 years ago

timothymillar commented 3 years ago

Currently alt alleles are only included if they are called in a genotype. This will cause issues with future use of the GP or GL FORMAT fields.

A better approach is to include alt alleles exceeding a specific quality score. These scores can be calculated across all sample calls or posterior distributions (?) and included as an INFO field of size A. The argument --alt-quality or similar can be used to record the combined evidence for each allele. This approach will limit the size of the GP and GL fields.

The GP field may still sum to less than 1 if this approach is used but the VCF standard doesn't appear to specify that the posterior distribution has to be complete, only that it is recorded for possible genotype given the called alleles.

The --call-filtered flag could then be removed. The issue --call-filtered currently is that calling alleles for samples with 0 read coverage results in alleles that are a random sample of the prior distribution. This makes a mess of the output VCF.

timothymillar commented 3 years ago

Done in #100