Decide what to do with coverage reporting in presence of large deletions.
Currently, we can have following two cases:
1) query aligned as 100M600D100M somewhere in the reference. Then coverage values for the big deletion in the middle are missing. (reference region is not covered by query)
2) query aligned as 100M599D100M somewhere in the reference. Then coverage values for the big deletion in the middle are present (reference region is covered by query).
The threshold of 600 deletions is sort of arbitrary.
We would like to develop a better decision procedure on what to report as "coverage".
Possibly, one that looks into the individual reads (from fastq files) in order to see whether it was the reads that spanned the big deletion, or whether the query is two separate consensus sequences "stitched" together.
Decide what to do with coverage reporting in presence of large deletions.
Currently, we can have following two cases: 1) query aligned as
100M600D100M
somewhere in the reference. Then coverage values for the big deletion in the middle are missing. (reference region is not covered by query) 2) query aligned as100M599D100M
somewhere in the reference. Then coverage values for the big deletion in the middle are present (reference region is covered by query).The threshold of 600 deletions is sort of arbitrary.
We would like to develop a better decision procedure on what to report as "coverage". Possibly, one that looks into the individual reads (from fastq files) in order to see whether it was the reads that spanned the big deletion, or whether the query is two separate consensus sequences "stitched" together.