biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
555 stars 104 forks source link

problem of deduplication #512

Open crazysummerW opened 9 months ago

crazysummerW commented 9 months ago

Hello, I'm having an issue when using sambamba 1.0.0 to deduplicate paired-end sequencing BAM files in NGS data. After deduplication, the resulting BAM file contains reads that are completely identical, with exactly the same ID and detailed information. This is causing problems in my structural variation (SV) analysis.

What could be the reason for this, and is there a way to resolve it? I would like the deduplicated BAM file to have unique read IDs.

Information of reads before deduplication: `samtools view sample.sorted.bam chrY|grep E100074100L1C005R0181713731

E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309 E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309`

Information of reads after deduplication: `samtools view sample.sorted.dedup.bam chrY|grep E100074100L1C005R0181713731

E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309 E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309`

Looking forward to your reply. Thanks