BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

--dpad --gbar #475

Open zznx opened 1 month ago

zznx commented 1 month ago

Hi, I'm reading the manual I don't know what's going on here, but if there's a problem, how can I solve it myself? Can you explain? Or provide relevant literature, thank you

https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#bowtie2-options-align-paired-reads
--dpad <int> | "Pads" dynamic programming problems by <int> columns on either side to allow gaps. Default: 15.
-- | --
--gbar <int> | Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
ch4rr0 commented 1 month ago

Hello,

--gbar <int> | Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.

Given the following reference:

>a
GGAATATTTGCGATTTGCCATTTTCTCTCAAGAGT

and read:

>r1
GGAATAGCGATTTGCCATTTTCTCTCAAGAGT

Trying to align the read to reference will cause bowtie2 to report a gap in the alignment i.e. in order to transform the reference to the read the 3 Ts, starting at position 7, would have to be deleted from the reference (you can see this represented in the CIGAR as well as the MD:Z).

GGAATATTTGCGATTTGCCATTTTCTCTCAAGAGT
||||||   ||||||||||||||||||||||||||
GGAATA---GCGATTTGCCATTTTCTCTCAAGAGT
12345678901234567890123456789012345

./bowtie2-align-s -x /tmp/out -f read.fq  -a --overhang --gbar 4
r1  0   a   1   255 6M3D26M *   0   0   GGAATAGCGATTTGCCATTTTCTCTCAAGAGT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    AS:i:-14    XN:i:0  XM:i:0  XO:i:1  XG:i:3  NM:i:3  MD:Z:6^TTT26    YT:Z:UU

This gap happens within 6 bases of the start of the read so bowtie2 will report this alignment as valid since it does not violate the default for --gbar which is 4.

If we change --gbar to 6 then bowtie2 will no longer consider this alignment as valid.

./bowtie2-align-s -x /tmp/out -f read.fq  -a --overhang --gbar 6
1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
@HD VN:1.5  SO:unsorted GO:query
@SQ SN:a    LN:35
@PG ID:bowtie2  PN:bowtie2  VN:2.5.4    CL:"/home/rcharles/src/git/bowtie2/bowtie2-align-s -x /tmp/out -f read.fq -a --overhang --gbar 6"
r1  4   *   0   0   *   *   0   0   GGAATAGCGATTTGCCATTTTCTCTCAAGAGT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    YT:Z:UU

I will need to get back to you with an example explaining --dpad. In the meantime I hope this helps.

zznx commented 1 month ago

Yes, I know what you mean, it's a kind of assumption that I think there should be no gap in the first four bases, or it's a rule summarized by a lot of data. Which one is it?