Kingsford-Group / squid

SQUID detects both fusion-gene and non-fusion-gene structural variations from RNA-seq data
BSD 3-Clause "New" or "Revised" License
40 stars 22 forks source link

Question: removing PCR duplicates #9

Open fgvieira opened 6 years ago

fgvieira commented 6 years ago

Dear all,

should PCR duplicates be removed from the BAM file before running SQUID? I see a message saying that SQUID is removing PCR duplicates but it is quite fast, what makes me think that it is only removing reads that are already marked as duplicates. Is it so?

If not, should I remove duplicates from the BAM and chimeric BAM files?

thanks,

Congm12 commented 6 years ago

SQUID removes PCR duplicates by comparing the alignment of the read and the mapping position of its mate pair. It takes advantage of the sorted BAM file to speed up the process: since PCR duplicates have the same alignments, SQUID only needs to compare the alignments of nearby SAM records to tell whether it is PCR duplicates.