Currently read depth DP is recorded for each sample and used for filtering.
The reported DP is the average of read depths as each variable position within the target region.
This leads to an edge case where DP is null when there are no variable positions within the region which also results in a call of hom for the ref allele 0/0 with probability of 1 / qual of 60.
These calls should be filtered at the sample level if there are too few reads for a sample but currently this is not happening because DP is not reported. Calculating mean depth across all positions in the region would be expensive.
The total number of reads within the region is still pulled out of the bam so this can be used as an alternative metric to filter on.
Consider calling this variable NR and using it as a default filter with a minimum threshold of ~5
Currently read depth
DP
is recorded for each sample and used for filtering. The reportedDP
is the average of read depths as each variable position within the target region. This leads to an edge case whereDP
is null when there are no variable positions within the region which also results in a call of hom for the ref allele0/0
with probability of 1 / qual of 60.These calls should be filtered at the sample level if there are too few reads for a sample but currently this is not happening because
DP
is not reported. Calculating mean depth across all positions in the region would be expensive.The total number of reads within the region is still pulled out of the bam so this can be used as an alternative metric to filter on. Consider calling this variable
NR
and using it as a default filter with a minimum threshold of ~5