BenLangmead / bowtie

An ultrafast memory-efficient short read aligner
Other
257 stars 76 forks source link

problem with option -n and -l #126

Open mewu3 opened 2 years ago

mewu3 commented 2 years ago

Hello,

I understand the behavior of bowtie as aligning from the 5' of reads (the seed region) to references, and by reading the document I understand that option -n number bowtie will tolerate the given number of mismatches in the seed region.

image

So here the commander line bowtie -x {params.refDB} -f {input[0]} -n 2 -l 7 -a --nofw --sam -p {params.threads} {output} was meant to aligne oligonucleotides (13-mer) to genomes data, i was hoping that the mismatches tolerated would limited to the seed region that would be de 5' of the 13-mers. Yet by examining the output sam file, especially the MD tags, it seems to me that mismatches are tolerated outside of the seed region.

4749    16  KX810065.1  3712    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:10C2   NM:i:1  XM:i:378
4749    16  MH118060.1  3766    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:10C2   NM:i:1  XM:i:378
4749    16  MF285667.1  3780    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:10C2   NM:i:1  XM:i:378
4749    16  MG957117.1  3653    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:1C8C2  NM:i:2  XM:i:378
4749    16  MH118080.1  3736    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:5C4C2  NM:i:2  XM:i:378
4749    16  MH118053.1  3739    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:1  MD:Z:5C4C2  NM:i:2  XM:i:378
4749    16  KF958311.1  5364    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:2  MD:Z:10C0C1 NM:i:2  XM:i:378
4749    16  JF317013.1  5352    255 13M *   0   0   TGTGGTTGGATGA   IIIIIIIIIIIII   XA:i:2  MD:Z:10C0C1 NM:i:2  XM:i:378

It could be my misunderstanding that leads to this false conclusion.

mewu3