biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
563 stars 105 forks source link

sambamba -F "not duplicate" processed bam still have duplicated marked by sambamba markdup #477

Closed piyushjo15 closed 3 years ago

piyushjo15 commented 3 years ago

Hi, I was following a ChIP-seq tutorial where they have mentioned to use sambamba to remove multimapped, unmapped and duplicate reads from bam file using below code sambamba view -h -f bam -F "[XS] == null and not unmapped and not duplicate" in.bam > out.bam

The out.bam I checked for duplicates using sambamba markdup sambamba markdup out.bam out2.bam Surprisingly I see that out2.bam bas marked duplicates, but weren't those reads filtered out in the first step? Am I misunderstanding something? Thanks, Piyush

piyushjo15 commented 3 years ago

Hi,

I had posted this issue over here earlier but then I thought since it is a support issue rather than bug, I posted it on google groups. I also found that F flag to remove duplicate via "not duplicate" works after I have marked duplicates. Initially I thought the flag will mark and then remove duplicate but that's not the case.

Thanks Piyush

luoxun-xl commented 1 year ago

Hi,

I had posted this issue over here earlier but then I thought since it is a support issue rather than bug, I posted it on google groups. I also found that F flag to remove duplicate via "not duplicate" works after I have marked duplicates. Initially I thought the flag will mark and then remove duplicate but that's not the case.

Thanks Piyush

Hi, before I also used sambamba view -h -f bam -F "[XS] == null and not unmapped and not duplicate" in.bam > out.bam for ChIP-seq filtering, so now how do you filter bam file in ChIP-seq?