Open tfenne opened 3 months ago
I'm also surprised that the number of pileups changes when duplicate reads are excluded - given that pileups are reported with any number of supporting reads, I would expect the number of pileups to be constant while the number of supporting reads per pileup may decrease
I ran the new version against the example data sent internally.
Lines output:
samtools markdup
marking duplicates only, prior to this change: 2030samtools markdup
marking duplicates only, after to this change: 588samtools markdup | samtools view -F 0x400
, both prior to and after this change: 550When I took a look at what the differences were, I noticed the read support reported appears to include the duplicate reads. In one example, the hard-filtered version reported 37 total reads and the soft-marked 129.
If the user wants to exclude duplicates, then only the unmarked reads should be counted.