PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Less stringent filtering #96

Closed timothymillar closed 3 years ago

timothymillar commented 3 years ago

The current filtering is quite stringent and hard filtered calls are not that useful for diagnoses.

The main reason for hard filtering in the past was to limit the number of alleles in the VCF, especially for samples with few/no reads which produced a random sample of the prior distribution. A better approach to selecting alleles for the VCF record is described in #93. With that resolved we should switch to soft filtering by default.

Secondly some of the filters being applied are not that useful. The filter based on posterior probability of allelelic phenotype is not providing anything that the user can't apply themselves based on the reported probability. The MCMC convergence based filters are more useful as diagnosis rather than filters, especially when genotypes are recalled from the observed haplotypes and may not reflect the MCMC. Generally we should move per sample filters into other format fields as diagnostic metrics. Any remaining sample filters should be applied softly by default.

timothymillar commented 3 years ago

Done in #100