BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
664 stars 158 forks source link

Error message: (ERR): bowtie2-align died with signal 11 (SEGV) #257

Closed StevenWingett closed 4 years ago

StevenWingett commented 5 years ago

Hi,

I have incorporated Bowtie2 into my Hi-C pipeline HiCUP (https://www.bioinformatics.babraham.ac.uk/projects/hicup/). I have received an email from a HiCUP user stating that Bowtie2 fails when using HiCUP. The error message is: “(ERR): bowtie2-align died with signal 11 (SEGV)”

When I try running their data on my system, Bowtie2 functions correctly. Do you know what typically is the reason for this error message? This may help me troubleshoot the problem.

Many thanks, Steven

Nheyer commented 4 years ago

I think this is an error relating to bowtie2 overflowing the stack when processing with many threads

ch4rr0 commented 4 years ago

Do you have any evidence of this happening?

Nheyer commented 4 years ago

yes it is happening on the file I am trying to align, getting debug binary now

Nheyer commented 4 years ago

it also seems to be happening on a high depth runs (~10M reads)

ch4rr0 commented 4 years ago

Sample command line and backtrack, if you're able to produce one, will be appreciated.

Thank you.

Nheyer commented 4 years ago

command line is : bowtie2 -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > temp_file.sam

(faster to error with --threads X) running debug now

Nheyer commented 4 years ago

./bowtie2-2.3.5.1-linux-x86_64/bowtie2 --debug -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > temp_file.sam Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. (ERR): bowtie2-align died with signal 11 (SEGV)

Nheyer commented 4 years ago

there dosn't appear to be any backtrack....

ch4rr0 commented 4 years ago

That's fine. I will try to recreate the issue using a command line similar to yours. We recently release a beta version of bowtie2. Can you try rerunning your command using that version? Let me know if the issue still persists.

ch4rr0 commented 4 years ago

FYI -- there was an issue with with multi-threaded alignment of BAM reads in this new build. I pushed a change to resolve this, but have not created a new build with the change. Single threaded alignment should still work.

Nheyer commented 4 years ago

aaah, so that likely explains why the threaded one still just throws an error, not a seg fault this time, but single threaded I still get a seg fault , but at least now there is a trackback @ch4rr0 sorry this was so long to get back to you it literally ran for >5 hrs before throwing an error Also I tried to compile from source, but I couldn't seem to get the debug binaries to compile command ran :

./bowtie2-2.4.0-beta-linux-x86_64/bowtie2 --debug -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > /dev/null Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. Error in `/public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug': free(): corrupted unsorted chunks: 0x00005601f0be2ad0 ======= Backtrace: ========= /lib64/libc.so.6(+0x81499)[0x7f6d271a4499] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x444b8)[0x5601efd294b8] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x454cc)[0x5601efd2a4cc] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x4b945)[0x5601efd30945] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x4f0b6)[0x5601efd340b6] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x15933)[0x5601efcfa933] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6d27145445] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x16755)[0x5601efcfb755] ======= Memory map: ======== 5601efce5000-5601eff4b000 r-xp 00000000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f014b000-5601f0151000 r--p 00266000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f0151000-5601f0155000 rw-p 0026c000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f0155000-5601f015a000 rw-p 00000000 00:00 0 5601f0b9a000-5601f0c35000 rw-p 00000000 00:00 0 [heap] 7f6d20000000-7f6d20974000 rw-p 00000000 00:00 0 7f6d20974000-7f6d24000000 ---p 00000000 00:00 0 7f6d26120000-7f6d26121000 ---p 00000000 00:00 0 7f6d26121000-7f6d27123000 rw-p 00000000 00:00 0 7f6d27123000-7f6d272e6000 r-xp 00000000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d272e6000-7f6d274e5000 ---p 001c3000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274e5000-7f6d274e9000 r--p 001c2000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274e9000-7f6d274eb000 rw-p 001c6000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274eb000-7f6d274f0000 rw-p 00000000 00:00 0 7f6d274f0000-7f6d27505000 r-xp 00000000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27505000-7f6d27704000 ---p 00015000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27704000-7f6d27705000 r--p 00014000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27705000-7f6d27706000 rw-p 00015000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27706000-7f6d27807000 r-xp 00000000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27807000-7f6d27a06000 ---p 00101000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a06000-7f6d27a07000 r--p 00100000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a07000-7f6d27a08000 rw-p 00101000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a08000-7f6d27a0a000 r-xp 00000000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27a0a000-7f6d27c0a000 ---p 00002000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0a000-7f6d27c0b000 r--p 00002000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0b000-7f6d27c0c000 rw-p 00003000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0c000-7f6d27c23000 r-xp 00000000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27c23000-7f6d27e22000 ---p 00017000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e22000-7f6d27e23000 r--p 00016000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e23000-7f6d27e24000 rw-p 00017000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e24000-7f6d27e28000 rw-p 00000000 00:00 0 7f6d27e28000-7f6d27e4a000 r-xp 00000000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d28023000-7f6d28027000 rw-p 00000000 00:00 0 7f6d2803c000-7f6d28049000 rw-p 00000000 00:00 0 7f6d28049000-7f6d2804a000 r--p 00021000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d2804a000-7f6d2804b000 rw-p 00022000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d2804b000-7f6d2804c000 rw-p 00000000 00:00 0 7ffc7c53a000-7ffc7c55b000 rw-p 00000000 00:00 0 [stack] 7ffc7c599000-7ffc7c59b000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] (ERR): bowtie2-align died with signal 6 (ABRT)

ch4rr0 commented 4 years ago

I published a new beta with a fix for this issue. Can you give it a go and let me know if it has been resolved?

Nheyer commented 4 years ago

This seems to have fixed it !! Thank you so much!

ch4rr0 commented 4 years ago

These changes have been released as part of v2.4.0

prullens commented 4 years ago

I get the same error when running HiC-Pro (https://github.com/nservant/HiC-Pro), which depends upon bowtie2. Strangely the error appears to occur after successful mapping of the reads. The following info appears in the log file:

HiC-Pro mapping

536273614 reads; of these: 536273614 (100.00%) were unpaired; of these: 54708870 (10.20%) aligned 0 times 392765431 (73.24%) aligned exactly 1 time 88799313 (16.56%) aligned >1 times 89.80% overall alignment rate (ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)

So the reads seem to have mapped, yet this error occurs and halts the HiC-Pro pipeline. These are the bowtie2 options I specify in HiC-Pro:

BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end –reorder

Like mentioned here, I tried installing the “new-beta” version of bowtie2 (bowtie2-2.4.0-beta-linux-x86_64.zip), but unfortunately this does not prevent the error.

jianshu93 commented 3 years ago

I still have this error in 2.4.2 when using more than 2 threads in macOS. I am wondering how to fix it...

Jianshu

Nheyer commented 3 years ago

@jianshu93 can you post the logs?

jianshu93 commented 3 years ago

Error message: (ERR): bowtie2-align died with signal 11 (SEGV)

ch4rr0 commented 3 years ago

Hello,

Can you try running the bowtie2 again in debug mode and post the error? That will help tremendously.

Thank you

On Feb 18, 2021, at 4:15 PM, Jianshu_Zhao notifications@github.com wrote:

 Error message: (ERR): bowtie2-align died with signal 11 (SEGV)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

jianshu93 commented 3 years ago

that is all i have

ch4rr0 commented 3 years ago

In that case:

What is your command line? What’s the size of the index you’re using? How much memory is available on the host machine?

On Feb 18, 2021, at 4:18 PM, Jianshu_Zhao notifications@github.com wrote:

 that is all i have

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

jychoilab commented 3 years ago

Hi I'm using ver 2.4.2 and I still get this error. Here's --version of my bowtie2

bowtie2 --version /share/apps/bowtie2/2.4.2/bowtie2-align-s version 2.4.2 64-bit Built on fcc614744c04 Tue Oct 6 03:06:29 UTC 2020 Compiler: gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) Options: -O3 -msse2 -funroll-loops -g3 -g -O2 -fvisibility=hidden -I/hbb_exe_gc_hardened/include -ffunction-sections -fdata-sections -fstack-protector -D_FORTIFY_SOURCE=2 -fPIE -DPOPCNT_CAPABILITY -DWITH_TBB -std=c++11 -DNO_SPINLOCK -DWITH_QUEUELOCK=1 Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

I'm not doing anything fancy. I'm just running the most basic command

bowtie2 -x refindex -1 SRR2029441_1.fastq.gz -2 SRR2029441_2.fastq.gz -S out.sam

And here's the message I get with --debug

Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. bowtie2-align-l-debug: word_io.h:125: T readU(FILE*, bool) [with T = unsigned int; FILE = _IO_FILE]: Assertion `false' failed. (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

I'm using a FASTQ that's not trimmed or QC'd or anything. Its hot off converting to a FASTQ from SRA

Thank you for the help!

jianshu93 commented 3 years ago

Hello,

This error is still there: (ERR): bowtie2-align died with signal 11 (SEGV)

when using more than 1 thread. I am so surprised that it still not fixed after 3 major release... It is slow with 1 thread for a median size dataset. bwa-mem2 is way faster and very close results in terms of average depth after filtering. Appearently is is related to threads

Thanks,

Jianshu

jianshu93 commented 3 years ago

I have tested on 3 system (macOS, ubuntu and centos). I got the same error.

Jianshu

ch4rr0 commented 3 years ago

@cjy8709 -- Thank you for the detailed issue. There may have been an issue with the index that you built that caused the short read. In any event I have gone ahead and committed a change to bug_fixes to print an error message and exit if there are any short reads in the affected functions.

@jianshu93 -- Could you provide something extra to help debug this issue e.g.

eboyden commented 3 years ago

I am getting the same error, using Bowtie2.4.4. Neither using different indices nor running multi/single-threaded produces a different result. But it seems to be specific to aligning unmapped bams; I've never encounted this problem when aligning fastqs.

bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x hg38.analysisSet -b unmapped.bam -S mapped.sam
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)

If I convert the ubam to an interleaved fastq first and then pipe the output to bowtie2, it works just fine.

samtools-1.13/samtools fastq -@ 8 -n -T [taglist] unmapped.bam | bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --sam-append-comment --very-fast --no-discordant --no-mixed -x hg38.analysisSet --interleaved - -S mapped.sam
ch4rr0 commented 3 years ago

Hello @eboyden,

I tried a few runs with a command line similar to yours but have yet to experience a segfault. Would you be willing to share your input file or run bowtie2 in debug mode so that we can get more information on exactly where the crash occurs?

eboyden commented 3 years ago

Hi @ch4rr0,

I can't share my original file, but I tried again with a 1000g NA12878 exome (SRR1518133 at https://www.internationalgenome.org/data-portal/sample/NA12878) and produced the same result, using the hg38 noalt+decoy pre-built reference from the Bowtie2 website. Aligning the fastqs directly works fine, but fails after converting them to a ubam with Picard FastqToSam. Using --debug produces no errors other than the signal 11.

time /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 --debug --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam -S SRR1518133.bam.bt2_vf_ndnm.sam
Warning: Running in debug mode.  Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)

real    0m16.870s
user    0m0.116s
sys 0m11.695s
time /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -1 SRR1518133_1.fastq.gz -2 SRR1518133_2.fastq.gz -S SRR1518133.fastq.bt2_vf_ndnm.sam
2539611 reads; of these:
  2539611 (100.00%) were paired; of these:
    100233 (3.95%) aligned concordantly 0 times
    1999397 (78.73%) aligned concordantly exactly 1 time
    439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate

real    1m59.187s
user    14m20.234s
sys 1m8.956s
ch4rr0 commented 3 years ago

Thank you for the additional information. I have been able to recreate the issue using a BAM file produced by picard's FastqToSAM. I will get started on a fix ASAP.

ch4rr0 commented 3 years ago

I have committed a fix to the bug_fixes branch that I think resolves the issue. We made the assumption that alignment data would not span BGZF blocks in the old code. The picard-produced BAM files proved that assumption wrong so I updated the code to make it more robust.

ch4rr0 commented 3 years ago

FYI -- there are still a few edge cases that need cleaning up before these changes are considered final.

ch4rr0 commented 3 years ago

@eboyden -- I think the changes are now complete and available on the bug_fixes branch. Are you willing to help test?

eboyden commented 3 years ago

Absolutely, will let you know how it goes - thanks for the quick work

eboyden commented 3 years ago

It seems to run now. Oddly it finds 0 reads if the reads are flagged as paired but --align-paired-reads isn't specified (or the opposite); is this intentional? Unless I'm missing something, wouldn't it be preferable to just align paired-flagged reads in single-end mode, and reset the bitflag? Or better yet, assume name-sorted input and attempt to align in paired mode if adjacent records have the same read name (save for a trailing /1 /2 suffix) and the appropriate bitflags, else treat reads as single-end. A replacement option --align-single-reads would reset all bitflags and perform all alignments in single-end mode. This would allow using bams that are a mix of single- and paired-end reads. (Understood if this is beyond the scope of this bugfix.)

# aligning paired ubam after conversion to interleaved fastq and piped
/tools/samtools/samtools-1.13/samtools fastq -@ 8 -n -o /dev/stdout SRR1518133.bam | /tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as --interleaved - -S /dev/null
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 5079222 reads
2539611 reads; of these:
  2539611 (100.00%) were paired; of these:
    100233 (3.95%) aligned concordantly 0 times
    1999397 (78.73%) aligned concordantly exactly 1 time
    439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate

# aligning paired ubam in non-paired mode
/tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam -S /dev/null
0 reads
0.00% overall alignment rate

# aligning paired ubam in paired mode
/tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam --align-paired-reads -S /dev/null
2539611 reads; of these:
  2539611 (100.00%) were paired; of these:
    100233 (3.95%) aligned concordantly 0 times
    1999397 (78.73%) aligned concordantly exactly 1 time
    439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate

But I found another issue: it doesn't appear to be able to accept piped input using the "-" shorthand (unlike other input options). The original error I encountered was actually with piped bam input, and that error mode still seems to be present. But /dev/stdin works; and even works with 2.4.4. So my original issue might have been a completely separate one from the one you fixed. Presumably piped input isn't subject to the same BGZF block problem you identified?

/tools/samtools/samtools-1.13/samtools view SRR1518133.bam -b | /tools/bowtie2/bowtie2-bug_fixes/bowtie2 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b - --align-paired-reads -S /dev/null
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)

/tools/samtools/samtools-1.13/samtools view -@ 8 SRR1518133.bam -b | /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b /dev/stdin --align-paired-reads -S /dev/null
2539611 reads; of these:
  2539611 (100.00%) were paired; of these:
    100233 (3.95%) aligned concordantly 0 times
    1999397 (78.73%) aligned concordantly exactly 1 time
    439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate
eboyden commented 3 years ago

Interesting: when I write a ubam with fgbio CallMolecularConsensusReads or picard FastqToSam and then use samtools view -b/u to pipe it to bowtie2-2.4.4 -b /dev/stdin it runs fine; but when I pipe the output from either CallMolecularConsensusReads or FastqToSam directly to bowtie2-2.4.4 it crashes. However it doesn't crash when I pipe from either software directly to bowtie2-bug_fixes. So perhaps it depends on how the upstream software handles writing bams to stdout? In any case, I have not been able to make the bug_fixes branch crash however I've run it.

eboyden commented 3 years ago

I think I found a new bug - not sure if I should open a new issue or continue this thread since it was only exposed by the previous fix.

When I align a ubam generated by fgbio CallMolecularConsensusReads using bowtie2-bug_fixes -b /dev/stdin --align-paired-reads and then pipe that to samtools, it works fine. But if I add the --preserve-tags option, it produces the following error:

[E::aux_parse] incomplete aux field
[W::sam_read1_sam] Parse error at line 3084
samtools sort: truncated file. Aborting

Of note, fgbio-generated ubams include a couple of gnarly tags; a typical line looks like this:

NA12878:1000009 77  *   0   0   *   *   0   0   TATCATCTGCATCTCTCGACTTCGTTCTACCCGAATCCATTTCCCCCGATACCTGAATAAGAACGATCAAAACTGAGTGAGTGAATGGGTCAAACCCAGCTCCCATCATTCTTTTCTACTCTCACAGCC   DDDEDEDEEEDDDEEEEEEEEDDCDDDEEEEDCDEEEEEEEDEEEEED>DEEDEEDDDDDEEDEEEEEEEDDECD>DEECBDDD0DDDEEEDEEEEEEEDDEDDEDEEEEEEEDEEEEDEEEECECDDE   cD:i:1  cE:f:0  RG:Z:A  MI:Z:1000009    cM:i:1  RX:Z:CCACAGACAA cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1    ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

When I align without --preserve-tags, it looks fine, like so:

NA12878:1000009 83  chr2    218661329   42  129M    =   218661308   -150    GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA   EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:129    YS:i:0  YT:Z:CP

But with --preserve-tags, it looks like this:

NA12878:1000009 83      chr2    218661329       42      129M    =       218661308       -150    GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA       EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:129        YS:i:0  YT:Z:CP cD:i:1  cE:f:   ^@^@:^@:        ^@R:G:  ZA:^@:  MI:Z:1000009    cM:i:1  RX:Z:CCACAGACAA cd:B:   s<81>:^@:       ^@^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:
        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:
        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:
        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:
        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:
        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:
        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:
        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:^A:        ^@^A:^@:        ^A^@:i:101
      Bs:<81>:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:
        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:        ^@^@:^@:

Samtools is able to parse the fgbio ubams directly without any issues, so it seems like Bowtie2 is doing something. And if I convert the ubam to a commented interleaved fastq using samtools fastq -T and pipe it to bowtie2 with --append-sam-comment, then the output looks as expected, like so:

NA12878:1000009 83      chr2    218661329       42      129M    =       218661308       -150    GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA       EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:129        YS:i:0  YT:Z:CP cD:i:1  cE:f:0  RG:Z:A  MI:Z:1000009    cM:i:1  RX:Z:CCACAGACAA cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1        ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Given the amount of code sharing between Bowtie2 and Hisat2, I wonder if it could be related to this issue: https://github.com/DaehwanKimLab/hisat2/issues/316

ch4rr0 commented 3 years ago

Hello @eboyden,

Bowtie2 not producing any output when trying to align paired-end reads, in BAM format, without the --align-paired-reads is intentional. I am contemplating your suggestion of automatically detecting and aligning paired/single end reads, but that change is currently lower in priority. I am looking into the issue with --preserve-tags option.

Thank you for helping test and for the detailed bug reports.

ch4rr0 commented 3 years ago

I expanded on the code for handling aux data. Here's a sample run:

./bowtie2-align-s -x hg19 -b output.bam --preserve-tags --sam-nohead

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    1 (100.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
100.00% overall alignment rate
NA12878:1000009 16  chr2    219526052   42  129M    *   0   0   GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA   EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:129    YT:Z:UU cD:i:1  cE:f:0.000000   RG:Z:A  MI:Z:1000009    cM:i:1  RX:Z:CCACAGACAA cd:B:s:1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1    ce:B:s:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

I still need to add support for processing :H: type tags to be fully compliant with the spec. We currently convert [cCsSiI] to i so that we are compliant with the types specified in the 'Sequence Alignment/Map Optional Fields Specification'.

The code is available in the bug_fixes branch.

eboyden commented 3 years ago

Thanks @ch4rr0 - I tried it on my ubam and the output from bowtie2 looked consistent with yours; but when I piped it to samtools fixmate I got the following error:

[E::aux_parse] B aux field type not followed by ','
samtools fixmate: Couldn't read from input file
[E::aux_parse] B aux field type not followed by ','

Samtools fixmate can read the ubam directly. Looking at the bowtie2 output I see this:

NA12878:1000009 99      chr2    218661341       42      87M     =       218661341       87      AGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGG EEEDEDDEDEEEEEDDEEEEEEEEDDDDEEEEEEEDDDDDDCDEDDDDEEEEDCDCDDDDEEEEEEDCDDEABDCAAD;ACDDDDEE AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:87 YS:i:0  YT:Z:CP cD:i:1  cE:f:0.000000   RG:Z:A  MI:Z:1000009    cM:i:1  RX:Z:ATAACAGCGC cd:B:s:1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1    ce:B:s:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Whereas the ubam looks like this:

NA12878:1000009 77  *   0   0   *   *   0   0   AGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGG EEEDEDDEDEEEEEDDEEEEEEEEDDDDEEEEEEEDDDDDDCDEDDDDEEEEDCDCDDDDEEEEEEDCDDEABDCAAD;ACDDDDEE cD:i:1  cE:f:0  RG:Z:A  MI:Z:1000009    cM:i:1  RX:Z:ATAACAGCGC cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1    ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

I think the problem is that in the cd and ce tags, there should be a comma after "B:s" rather than a colon.

Unrelated, I'd suggest adding support for -b - to be consistent with other file input and output options. /dev/stdin works fine, but if one didn't know better one might assume that it's not possible to pipe input with -b, as I did at first.

ch4rr0 commented 3 years ago

Hello @eboyden,

I pushed a couple of changes to bug_fixes that should resolve both issues. Please let me know if they work for you.

Thanks

eboyden commented 3 years ago

Everything seems to work properly. Thanks again, really appreciate the quick work.