fulcrumgenomics / fgsv

Tools to gather evidence for structural variation via breakpoint detection.
MIT License
19 stars 1 forks source link

Defaulted to excluding duplicate and QC failing reads from pileup. #42

Open tfenne opened 3 months ago

ameynert commented 3 months ago

I ran the new version against the example data sent internally.

Lines output:

When I took a look at what the differences were, I noticed the read support reported appears to include the duplicate reads. In one example, the hard-filtered version reported 37 total reads and the soft-marked 129.

If the user wants to exclude duplicates, then only the unmarked reads should be counted.

msto commented 3 months ago

I'm also surprised that the number of pileups changes when duplicate reads are excluded - given that pileups are reported with any number of supporting reads, I would expect the number of pileups to be constant while the number of supporting reads per pileup may decrease