PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Record and filter on read count #38

Closed timothymillar closed 4 years ago

timothymillar commented 4 years ago

Currently read depth DP is recorded for each sample and used for filtering. The reported DP is the average of read depths as each variable position within the target region. This leads to an edge case where DP is null when there are no variable positions within the region which also results in a call of hom for the ref allele 0/0 with probability of 1 / qual of 60.

These calls should be filtered at the sample level if there are too few reads for a sample but currently this is not happening because DP is not reported. Calculating mean depth across all positions in the region would be expensive.

The total number of reads within the region is still pulled out of the bam so this can be used as an alternative metric to filter on. Consider calling this variable NR and using it as a default filter with a minimum threshold of ~5

timothymillar commented 4 years ago

Fixed in #47