PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Record unknown/novel alleles #89

Open timothymillar opened 3 years ago

timothymillar commented 3 years ago

In some (hopefully rare) cases a sample may contain one or more SNP alleles that are not specified as ref or alts in the input VCF. Currently these variants are removed during encoding resulting in non-informative gaps for the sake of the MCMC.

It would be good to ~have a per-sample filter for~ [record] the proportion of calls at a SNP that are known/unknown (i.e. specified in the input VCF). ~This should be formulated as a minimum threshold for the proportion of alleles that are present in the VCF for consistency with other filters. The default proportion that require matching should probably be ~ 0.9 as this allows a single miss-called base in a set of 10 or more reads. The code would be 'ka90' for 'Less than 90% of base calls match a known allele at one or more SNP positions'.~

timothymillar commented 3 years ago

We no longer filter samples in mchap, better to include a metric in the FORMAT fileds and filter later.