MHH-RCUG / Wochenende

Deprecated see https://github.com/MHH-RCUG/nf_wochenende : A whole Genome/Metagenome Sequencing Alignment Pipeline in Python3
https://github.com/MHH-RCUG/nf_wochenende
MIT License
37 stars 16 forks source link

NM bamtools curiosity : a few reads slip through #143

Open colindaven opened 3 years ago

colindaven commented 3 years ago

Very, very few reads, but some reads do slip through the bamtools filter, despite have more mismatches in the NM field than allowed. eg 1 read with 5 mismatches from 1x76bp data

with 2x38bp data the problem is worse. Yet the total number of reads is so low (<100 generally) it has no effect on results.

wochenende_test/pe38/1x76$ samtools view 9_S73_R1.ndp.trm.s.mm.dup.mq30.calmd.bam | grep NM:i:5
NB551160:334:HFTJCAFX2:4:11409:18660:18422      0       1_CP015401_2_Bacteroides_caecimuris_strain_I48_chromosome__complete_genome_BAC  634150  60      69M7S   *       0       0       GACCAAACCGAAGTTGCTGGTAAGTGATTATGACATGCTTTGGAGTAAGGATAGCTATAATGATGCTACTAGTGAA  A/AAAEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEE/E    AS:i:69 XS:i:0  RG:Z:9_S73_R1_001    NM:i:5   MD:Z:64N0N0N0N0N0
colindaven commented 3 years ago

Quantified:

(base) rcug@hpc-rc09:/ngsssd1/rcug/wochenende_test/pe38/2x38$ samtools view 9_S73_2_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam | grep NM:i:1 | wc -l
70861
(base) rcug@hpc-rc09:/ngsssd1/rcug/wochenende_test/pe38/2x38$ samtools view 9_S73_2_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam | grep NM:i:0 | wc -l
508919
(base) rcug@hpc-rc09:/ngsssd1/rcug/wochenende_test/pe38/2x38$ samtools view 9_S73_2_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam | grep NM:i:2 | wc -l
97
(base) rcug@hpc-rc09:/ngsssd1/rcug/wochenende_test/pe38/2x38$ samtools view 9_S73_2_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam | grep NM:i:3 | wc -l
30