Paired-end bowtie2-align 2.5.2 crash when R2 fastq reads have been trimmed to zero length

cnluzon commented 7 months ago

Hi! I have an issue running bowtie2 version 2.5.2 that I did not have with previous 2.5.1 version.

bowtie2 --version output:

/home/user/miniconda3/envs/bowtie2_test/bin/bowtie2-align-s version 2.5.2
64-bit
Built on fv-az632-196
Sat Oct 14 05:42:50 UTC 2023
Compiler: gcc version 12.3.0 (conda-forge gcc 12.3.0-2) 
Options: -O3 -msse2 -funroll-loops -g3 -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/user/miniconda3/envs/bowtie2_test/include -fdebug-prefix-map=/opt/conda/conda-bld/bowtie2_1697261192232/work=/usr/local/src/conda/bowtie2-2.5.2 -fdebug-prefix-map=/home/user/miniconda3/envs/bowtie2_test=/usr/local/src/conda-prefix -std=c++11 -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1 -DWITH_ZSTD
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 call that returns the error:

bowtie2 --verbose -p 2 -x ref/ref1 -1 test_R1.fastq -2 test_R2.fastq --fast

The error:

Error, fewer reads in file specified with -2 than in file specified with -1
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): bowtie2-align exited with value 134

I narrowed down the issue to the smallest FASTQ that would give me the error and I realise that what is triggering the event is an empty sequence read on R2:

@NB501365:168:HWFNVBGX3:1:11103:6189:3099_ACCGGG
CGACGCTCGGAAGAGCGTCGTGTACCCTATAGT
+
AAAAAEEEEEAEEEEEEEEEEEEEEEEAEEEEE
@NB501365:168:HWFNVBGX3:1:11103:9300:3121_CCCGCA

+

@NB501365:168:HWFNVBGX3:1:11103:5564:3152_AGGCCC
GTCTGGACCTGGAAGAGATCGGAAGAGCGTCGT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEE

I understand this comes from the previous trimming step, but before (bowtie2 2.5.1), the corresponding read on R1 would be mapped. However now it is probably understood as an empty line and bowtie2 thinks that I have different number of reads. I cannot just remove those empty reads on the R2 file because then I would need to remove heir mate in R1 too, so I think this case should somehow be handled? I have not been able to find information on whether FASTQ files allow for such structure, but on the previous 2.5.1 version this would be correctly handled and the read would be outputted as:

NB501365:168:HWFNVBGX3:1:11103:9300:3121_CCCGCA 4       *       0       0       *       *       0       0       TGTGCAGTTCTAAGGAGATGGTGACTGAACAGACGT    EEEEEAEEEEEEEEEEEEEEEEE/AEEAA6EEEEEA    YT:Z:UU

in the alignment, with a correct YT:Z:UU for a read that was not paired.

The verbose output:

(INFO): After arg handling:
(INFO):   Binary args:
[ -p 2 -x ref/ref1 --fast --verbose -1 test_R1.fastq -2 test_R2.fastq ]
(INFO): Using the small index (ref/ref1.1.bt2).
(INFO): "/home/user/miniconda3/envs/bowtie2_test/bin/bowtie2-align-s" --wrapper basic-0 -p 2 -x "ref/ref1" --fast --verbose -1 "test_R1.fastq" -2 "test_R2.fastq"
Applying preset: 'fast' using preset menu 'V0'
Final policy string: 'SEED=0;SEEDLEN=22;DPS=10;ROUNDS=2;IVAL=S,0,2.50'
Entered driver(): 12:04:55
Creating PatternSource: 12:04:55
Opening hit output file: 12:04:55
About to initialize fw Ebwt: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.1.bt2"
Opening "ref/ref1.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: 12:04:55
Headers:
    len: 700000
    bwtLen: 700001
    sz: 175000
    bwtSz: 175001
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 43751
    offsSz: 175004
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3646
    numLines: 3646
    ebwtTotLen: 233344
    ebwtTotSz: 233344
    color: 0
    reverse: 0
Reading plen (1): 12:04:55
Dispatching to search driver: 12:04:55
About to initialize rev Ebwt: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.rev.1.bt2"
Opening "ref/ref1.rev.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: 12:04:55
Headers:
    len: 700000
    bwtLen: 700001
    sz: 175000
    bwtSz: 175001
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 43751
    offsSz: 175004
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3646
    numLines: 3646
    ebwtTotLen: 233344
    ebwtTotSz: 233344
    color: 0
    reverse: 1
Reading plen (1): 12:04:55
Read 1 reference strings from 1 records: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.1.bt2"
Opening "ref/ref1.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: Error, fewer reads in file specified with -2 than in file specified with -1
12:04:55
Reading plen (1): 12:04:55
Reading rstarts (3): 12:04:55
Reading ebwt (233344): 12:04:55
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): bowtie2-align exited with value 134

The test files:

test_reads.zip

ch4rr0 commented 7 months ago

Hello,

Thank you for your detailed reporting on this issue and providing sample reads that I could use to recreate it. I have pushed a change to bug_fixes branch that should resolve this issue. Let me know if your thoughts.

ch4rr0 commented 7 months ago

Update -- I found an issue with the code that I am looking into. I apologise for the inconvenience.

cnluzon commented 7 months ago

No problem! right now I have it pinned to the previous version. Thanks for your rapid response!

ch4rr0 commented 7 months ago

This latest iteration should work. Here is the output I get when trying to align the test data against some generic index. Thank you for your patience.

4 reads; of these:
  3 (75.00%) were paired; of these:
    3 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3 pairs aligned 0 times concordantly or discordantly; of these:
      6 mates make up the pairs; of these:
        6 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
  1 (25.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate

BenLangmead / bowtie2

Paired-end bowtie2-align 2.5.2 crash when R2 fastq reads have been trimmed to zero length #451