BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

paired-end reads align receive different when swap R1 with R2 #466

Open JBSScience opened 4 months ago

JBSScience commented 4 months ago

paired-end sample R1 28nt R2 90nt align to HBV reference got 0 mapped read. But, got 1 mapped read if swap R1 with R2. tested with bwa mem for both cases and got 1 mapped for both cases.

Any suggestion?

case 1:

@PG ID:bowtie2 PN:bowtie2 VN:2.3.5.1 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 --seed 17 --score-min G,20,6 --trim5 0 --trim3 0 --very-sensitive-local --threads 10 -x viralRef -S test_3_R_1__viralAlign.sam -1 test_3_R_1_1.fastq -2 test_3_R_1_2.fastq" A00527:561:HG23MDRX2:1:2101:30273:13839 77 0 0 0 0 CATATGTCAGGCTACGTTCCTTGGCCAG FFFFFFFFFFFFFFFFFFFFFFFFFFFF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30273:13839 141 0 0 0 0 GCCACGCAGTTTTATCCGGTAAAGCGAATGATTAGAGGTCTTGGGGCCGAAACGATCTCAACCTATTCTCAAACTTTAAATGGGTAAGAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839 77 0 0 0 0 TCAGACGCTATAGAAGTGGATGTCGATG FFFFFFFFFFFFFFFFFFFFFFFFFF,F YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839 141 0 0 0 0 GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF: YT:Z:UP A00527:561:HG23MDRX2:1:2101:30382:13839 77 0 0 0 0 TCGGGTGAAACTGCTAAAAATATCCAAT FFFFFFF,FFFFFFFFF,F:FFFFFFFF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30382:13839 141 0 0 0 0 GTGGTATCAACGCAGAGTACATGGGGCGGGCCGCCGGTGAAATACCACTACTCTGATCGTTTTTTCACTGACCCGGTGAGGCGGGGGGGC F:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF,FFFFFFFFFF YT:Z:UP

case 2: swap R1 with R2 and 1 read mapped

@PG ID:bowtie2 PN:bowtie2 VN:2.3.5.1 CL:"/usr/bin/bowtie2-align-s --wrapper basic-0 --seed 17 --score-min G,20,6 --trim5 0 --trim3 0 --very-sensitive-local --threads 10 -x viralRef -S test_3_R_1__viralAlign.sam -1 test_3_R_1_1.fastq -2 test_3_R_1_2.fastq" A00527:561:HG23MDRX2:1:2101:30273:13839_CATATGTCAGGCTACG:TTCCTTGGCCAG 77 0 0 0 0 GCCACGCAGTTTTATCCGGTAAAGCGAATGATTAGAGGTCTTGGGGCCGAAACGATCTCAACCTATTCTCAAACTTTAAATGGGTAAGAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30273:13839 141 0 0 0 0 CATATGTCAGGCTACGTTCCTTGGCCAG FFFFFFFFFFFFFFFFFFFFFFFFFFFF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839_TCAGACGCTATAGAAG:TGGATGTCGATG 73 gnl|hbvnuc|AB076679_FT00000_P-A 1886 0 26S49M15S = 1886 0 GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF: AS:i:98 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:49 YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839 133 gnl|hbvnuc|AB076679_FT00000_P-A 1886 0 = 1886 0 TCAGACGCTATAGAAGTGGATGTCGATG FFFFFFFFFFFFFFFFFFFFFFFFFF,F YT:Z:UP A00527:561:HG23MDRX2:1:2101:30382:13839_TCGGGTGAAACTGCTA:AAAATATCCAAT 77 0 0 0 0 GTGGTATCAACGCAGAGTACATGGGGCGGGCCGCCGGTGAAATACCACTACTCTGATCGTTTTTTCACTGACCCGGTGAGGCGGGGGGGC F:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF,FFFFFFFFFF YT:Z:UP A00527:561:HG23MDRX2:1:2101:30382:13839 141 0 0 * 0 0 TCGGGTGAAACTGCTAAAAATATCCAAT FFFFFFF,FFFFFFFFF,F:FFFFFFFF YT:Z:UP

ch4rr0 commented 4 months ago

Hello,

I was not able to recreate this one. Here are my input files:

hbv.fastq

@r1
CATATGTCAGGCTACGTTCCTTGGCCAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFF

hbv_2.fastq

@r1
GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA
+
FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF:

Reference

>NC_003977.2 Hepatitis B virus (strain ayw) genome
AATTCCACAACCTTCCACCAAACTCTGCAAGATCCCAGAGTGAGAGGCCTGTATTTCCCTGCTGGTGGCTCCAGTTCAGG
AACAGTAAACCCTGTTCTGACTACTGCCTCTCCCTTATCGTCAATCTTCTCGAGGATTGGGGACCCTGCGCTGAACATGG
AGAACATCACATCAGGATTCCTAGGACCCCTTCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAGAATCCTCACAATA
CCGCAGAGTCTAGACTCGTGGTGGACTTCTCTCAATTTTCTAGGGGGAACTACCGTGTGTCTTGGCCAAAATTCGCAGTC
CCCAACCTCCAATCACTCACCAACCTCTTGTCCTCCAACTTGTCCTGGTTATCGCTGGATGTGTCTGCGGCGTTTTATCA
TCTTCCTCTTCATCCTGCTGCTATGCCTCATCTTCTTGTTGGTTCTTCTGGACTATCAAGGTATGTTGCCCGTTTGTCCT
CTAATTCCAGGATCCTCAACAACCAGCACGGGACCATGCCGGACCTGCATGACTACTGCTCAAGGAACCTCTATGTATCC
CTCCTGTTGCTGTACCAAACCTTCGGACGGAAATTGCACCTGTATTCCCATCCCATCATCCTGGGCTTTCGGAAAATTCC
TATGGGAGTGGGCCTCAGCCCGTTTCTCCTGGCTCAGTTTACTAGTGCCATTTGTTCAGTGGTTCGTAGGGCTTTCCCCC
ACTGTTTGGCTTTCAGTTATATGGATGATGTGGTATTGGGGGCCAAGTCTGTACAGCATCTTGAGTCCCTTTTTACCGCT
GTTACCAATTTTCTTTTGTCTTTGGGTATACATTTAAACCCTAACAAAACAAAGAGATGGGGTTACTCTCTAAATTTTAT
GGGTTATGTCATTGGATGTTATGGGTCCTTGCCACAAGAACACATCATACAAAAAATCAAAGAATGTTTTAGAAAACTTC
CTATTAACAGGCCTATTGATTGGAAAGTATGTCAACGAATTGTGGGTCTTTTGGGTTTTGCTGCCCCTTTTACACAATGT
GGTTATCCTGCGTTGATGCCTTTGTATGCATGTATTCAATCTAAGCAGGCTTTCACTTTCTCGCCAACTTACAAGGCCTT
TCTGTGTAAACAATACCTGAACCTTTACCCCGTTGCCCGGCAACGGCCAGGTCTGTGCCAAGTGTTTGCTGACGCAACCC
CCACTGGCTGGGGCTTGGTCATGGGCCATCAGCGCATGCGTGGAACCTTTTCGGCTCCTCTGCCGATCCATACTGCGGAA
CTCCTAGCCGCTTGTTTTGCTCGCAGCAGGTCTGGAGCAAACATTATCGGGACTGATAACTCTGTTGTCCTATCCCGCAA
ATATACATCGTTTCCATGGCTGCTAGGCTGTGCTGCCAACTGGATCCTGCGCGGGACGTCCTTTGTTTACGTCCCGTCGG
CGCTGAATCCTGCGGACGACCCTTCTCGGGGTCGCTTGGGACTCTCTCGTCCCCTTCTCCGTCTGCCGTTCCGACCGACC
ACGGGGCGCACCTCTCTTTACGCGGACTCCCCGTCTGTGCCTTCTCATCTGCCGGACCGTGTGCACTTCGCTTCACCTCT
GCACGTCGCATGGAGACCACCGTGAACGCCCACCAAATATTGCCCAAGGTCTTACATAAGAGGACTCTTGGACTCTCAGC
AATGTCAACGACCGACCTTGAGGCATACTTCAAAGACTGTTTGTTTAAAGACTGGGAGGAGTTGGGGGAGGAGATTAGGT
TAAAGGTCTTTGTACTAGGAGGCTGTAGGCATAAATTGGTCTGCGCACCAGCACCATGCAACTTTTTCACCTCTGCCTAA
TCATCTCTTGTTCATGTCCTACTGTTCAAGCCTCCAAGCTGTGCCTTGGGTGGCTTTGGGGCATGGACATCGACCCTTAT
AAAGAATTTGGAGCTACTGTGGAGTTACTCTCGTTTTTGCCTTCTGACTTCTTTCCTTCAGTACGAGATCTTCTAGATAC
CGCCTCAGCTCTGTATCGGGAAGCCTTAGAGTCTCCTGAGCATTGTTCACCTCACCATACTGCACTCAGGCAAGCAATTC
TTTGCTGGGGGGAACTAATGACTCTAGCTACCTGGGTGGGTGTTAATTTGGAAGATCCAGCGTCTAGAGACCTAGTAGTC
AGTTATGTCAACACTAATATGGGCCTAAAGTTCAGGCAACTCTTGTGGTTTCACATTTCTTGTCTCACTTTTGGAAGAGA
AACAGTTATAGAGTATTTGGTGTCTTTCGGAGTGTGGATTCGCACTCCTCCAGCTTATAGACCACCAAATGCCCCTATCC
TATCAACACTTCCGGAGACTACTGTTGTTAGACGACGAGGCAGGTCCCCTAGAAGAAGAACTCCCTCGCCTCGCAGACGA
AGGTCTCAATCGCCGCGTCGCAGAAGATCTCAATCTCGGGAATCTCAATGTTAGTATTCCTTGGACTCATAAGGTGGGGA
ACTTTACTGGGCTTTATTCTTCTACTGTACCTGTCTTTAATCCTCATTGGAAAACACCATCTTTTCCTAATATACATTTA
CACCAAGACATTATCAAAAAATGTGAACAGTTTGTAGGCCCACTCACAGTTAATGAGAAAAGAAGATTGCAATTGATTAT
GCCTGCCAGGTTTTATCCAAAGGTTACCAAATATTTACCATTGGATAAGGGTATTAAACCTTATTATCCAGAACATCTAG
TTAATCATTACTTCCAAACTAGACACTATTTACACACTCTATGGAAGGCGGGTATATTATATAAGAGAGAAACAACACAT
AGCGCCTCATTTTGTGGGTCACCATATTCTTGGGAACAAGATCTACAGCATGGGGCAGAATCTTTCCACCAGCAATCCTC
TGGGATTCTTTCCCGACCACCAGTTGGATCCAGCCTTCAGAGCAAACACCGCAAATCCAGATTGGGACTTCAATCCCAAC
AAGGACACCTGGCCAGACGCCAACAAGGTAGGAGCTGGAGCATTCGGGCTGGGTTTCACCCCACCGCACGGAGGCCTTTT
GGGGTGGAGCCCTCAGGCTCAGGGCATACTACAAACTTTGCCAGCAAATCCGCCTCCTGCCTCCACCAATCGCCAGTCAG
GAAGGCAGCCTACCCCGCTGTCTCCACCTTTGAGAAACACTCATCCTCAGGCCATGCAGTGG

Here are my results:

./bowtie2-align-s --score-min G,20,6 --seed 17 --very-sensitive-local -x viral -1 hbv.fastq -2 hbv_2.fastq --sam-nohead
1 reads; of these:
  1 (100.00%) were paired; of these:
    1 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    1 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    1 pairs aligned 0 times concordantly or discordantly; of these:
      2 mates make up the pairs; of these:
        1 (50.00%) aligned 0 times
        1 (50.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
50.00% overall alignment rate
r1  137 NC_003977.2 1891    22  29S46M15S   =   1891    0   GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA  FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF:  AS:i:85 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:20C25  YT:Z:UP
r1  69  NC_003977.2 1891    0   *   =   1891    0   CATATGTCAGGCTACGTTCCTTGGCCAG    FFFFFFFFFFFFFFFFFFFFFFFFFFFF    YT:Z:UP

When I flip the order of the reads I still get the same result.

./bowtie2-align-s --score-min G,20,6 --seed 17 --very-sensitive-local -x viral -2 hbv.fastq -1 hbv_2.fastq --sam-nohead
1 reads; of these:
  1 (100.00%) were paired; of these:
    1 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    1 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    1 pairs aligned 0 times concordantly or discordantly; of these:
      2 mates make up the pairs; of these:
        1 (50.00%) aligned 0 times
        1 (50.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
50.00% overall alignment rate
r1  73  NC_003977.2 1891    22  29S46M15S   =   1891    0   GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA  FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF:  AS:i:85 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:20C25  YT:Z:UP
r1  133 NC_003977.2 1891    0   *   =   1891    0   CATATGTCAGGCTACGTTCCTTGGCCAG    FFFFFFFFFFFFFFFFFFFFFFFFFFFF    YT:Z:UP

I am using v2.5.3. Please let me know if I am missing something in my testing.

JBSScience commented 4 months ago

I used same command as yours and issue was reproduced. I have bowtie2 version 2.3.5.1 (64-bit) on Ubuntu 20.04.3 LTS

case 1: bowtie2-align-s --score-min G,20,6 --seed 17 --very-sensitive-local -x viralRef -1 test_1_R1.fastq -2 test_1_R2.fastq --sam-nohead 1 reads; of these: 1 (100.00%) were paired; of these: 1 (100.00%) aligned concordantly 0 times 0 (0.00%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

1 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
1 pairs aligned 0 times concordantly or discordantly; of these:
  2 mates make up the pairs; of these:
    2 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

0.00% overall alignment rate A00527:561:HG23MDRX2:1:2101:30291:13839 77 0 0 0 0 TCAGACGCTATAGAAGTGGATGTCGATG FFFFFFFFFFFFFFFFFFFFFFFFFF,F YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839 141 0 0 0 0 GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF:YT:Z:UP

case 2: bowtie2-align-s --score-min G,20,6 --seed 17 --very-sensitive-local -x viralRef -1 test_1_R2.fastq -2 test_1_R1.fastq --sam-nohead 1 reads; of these: 1 (100.00%) were paired; of these: 1 (100.00%) aligned concordantly 0 times 0 (0.00%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

1 pairs aligned concordantly 0 times; of these:
  0 (0.00%) aligned discordantly 1 time
----
1 pairs aligned 0 times concordantly or discordantly; of these:
  2 mates make up the pairs; of these:
    1 (50.00%) aligned 0 times
    1 (50.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

50.00% overall alignment rate A00527:561:HG23MDRX2:1:2101:30291:13839 73 gnl|hbvnuc|NC003977_FT00000_C-D 1891 22 29S46M15S = 1891 0 GCAGTGGTATCAACGCAGAGTACATGGGCTGGCTTTGGGGCATGGACATTGACCCTTATAAAGAATTTGGAGCTAAAAAAAAAAAAAAAA FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFFFFF,FF:FFFF::FF: AS:i:85 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:20C25 YT:Z:UP A00527:561:HG23MDRX2:1:2101:30291:13839 133 gnl|hbvnuc|NC003977_FT00000_C-D 1891 0 * = 1891 0 TCAGACGCTATAGAAGTGGATGTCGATG FFFFFFFFFFFFFFFFFFFFFFFFFF,F YT:Z:UP

ch4rr0 commented 4 months ago

Is there a reason why you're still using v2.3.5.1? It looks like the issue you're experiencing has since been resolved.

JBSScience commented 4 months ago

Thanks for the reply. Will try the new version.