In some (hopefully rare) cases a sample may contain one or more SNP alleles that are not specified as ref or alts in the input VCF.
Currently these variants are removed during encoding resulting in non-informative gaps for the sake of the MCMC.
It would be good to ~have a per-sample filter for~ [record] the proportion of calls at a SNP that are known/unknown (i.e. specified in the input VCF). ~This should be formulated as a minimum threshold for the proportion of alleles that are present in the VCF for consistency with other filters. The default proportion that require matching should probably be ~ 0.9 as this allows a single miss-called base in a set of 10 or more reads. The code would be 'ka90' for 'Less than 90% of base calls match a known allele at one or more SNP positions'.~
In some (hopefully rare) cases a sample may contain one or more SNP alleles that are not specified as ref or alts in the input VCF. Currently these variants are removed during encoding resulting in non-informative gaps for the sake of the MCMC.
It would be good to ~have a per-sample filter for~ [record] the proportion of calls at a SNP that are known/unknown (i.e. specified in the input VCF). ~This should be formulated as a minimum threshold for the proportion of alleles that are present in the VCF for consistency with other filters. The default proportion that require matching should probably be ~ 0.9 as this allows a single miss-called base in a set of 10 or more reads. The code would be 'ka90' for 'Less than 90% of base calls match a known allele at one or more SNP positions'.~