fulcrumgenomics / fgbio

Tools for working with genomic and high throughput sequencing data.
http://fulcrumgenomics.github.io/fgbio/
MIT License
311 stars 67 forks source link

ClipBam not clipping overlapping reads #810

Open blackbeerd opened 2 years ago

blackbeerd commented 2 years ago

Hi fgbio,

I am running ClipBam at the end of my workflow to remove overlapping reads (command and screenshot below) and when I look at the resulting bam in IGV, I still see overlapping read pairs . Am I using it wrong? When I run this file through bamutils clipOverlap it seems to work right. Any thoughts?

java -Xmx1g -jar fgbio.jar ClipBam --clipping-mode=Hard --clip-overlapping-reads=T --input=my.bam --output=my.clipped.bam --auto-clip-attributes=True --ref=hg38.fasta --metrics=my.clipped.metrics.txt

Top=ClipBam output Bottom=bamutil output (first two reads with "A" alt are paired)

image

tfenne commented 2 years ago

@blackbeerd It's hard to tell from that picture. Assuming that those reads are mate pairs, then yes this sounds like a bug. Would you be able to attach two BAMs, each with just a single read-pair, before and after running ClipBam, that exhibit the problem please?

blackbeerd commented 2 years ago

@tfenne Here are those reads pre and post ClipBam (these are uncompressed sams - wouldn't let me upload bams). Let me know if you need a different format. Thanks for taking a look!

pre-ClipBam.txt post-ClipBam.txt

nh13 commented 2 years ago

@blackbeerd the input reads are both mapped to the reverse strand, so unfortunately these are not FR pairs

ClipBam says:

Clipping overlapping reads is only performed on FR read pairs

I think there could be a discussion about if we want to loosen the FR orientation requirement for ClipBam.

blackbeerd commented 2 years ago

Ahhh - thanks! I thought I had read through the tool description, I must have missed that. What's the thinking behind only clipping FR orientation, why not clip these RR overlaps as well?