Closed StevenWingett closed 4 years ago
I think this is an error relating to bowtie2 overflowing the stack when processing with many threads
Do you have any evidence of this happening?
yes it is happening on the file I am trying to align, getting debug binary now
it also seems to be happening on a high depth runs (~10M reads)
Sample command line and backtrack, if you're able to produce one, will be appreciated.
Thank you.
command line is : bowtie2 -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > temp_file.sam
(faster to error with --threads X) running debug now
./bowtie2-2.3.5.1-linux-x86_64/bowtie2 --debug -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > temp_file.sam Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. (ERR): bowtie2-align died with signal 11 (SEGV)
there dosn't appear to be any backtrack....
That's fine. I will try to recreate the issue using a command line similar to yours. We recently release a beta version of bowtie2. Can you try rerunning your command using that version? Let me know if the issue still persists.
FYI -- there was an issue with with multi-threaded alignment of BAM reads in this new build. I pushed a change to resolve this, but have not created a new build with the change. Single threaded alignment should still work.
aaah, so that likely explains why the threaded one still just throws an error, not a seg fault this time, but single threaded I still get a seg fault , but at least now there is a trackback @ch4rr0 sorry this was so long to get back to you it literally ran for >5 hrs before throwing an error Also I tried to compile from source, but I couldn't seem to get the debug binaries to compile command ran :
./bowtie2-2.4.0-beta-linux-x86_64/bowtie2 --debug -x /public/home/nheyer/nheyer/references/notch/extended_alignment_consensus_plus_ref --end-to-end --no-mixed --align-paired-reads --preserve-tags -b 19240_A154T_S1_R1_001_unmapped.sorted.bam > /dev/null Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. Error in `/public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug': free(): corrupted unsorted chunks: 0x00005601f0be2ad0 ======= Backtrace: ========= /lib64/libc.so.6(+0x81499)[0x7f6d271a4499] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x444b8)[0x5601efd294b8] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x454cc)[0x5601efd2a4cc] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x4b945)[0x5601efd30945] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x4f0b6)[0x5601efd340b6] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x15933)[0x5601efcfa933] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f6d27145445] /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug(+0x16755)[0x5601efcfb755] ======= Memory map: ======== 5601efce5000-5601eff4b000 r-xp 00000000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f014b000-5601f0151000 r--p 00266000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f0151000-5601f0155000 rw-p 0026c000 00:2e 85899449901 /public/groups/hausslerlab/people/nheyer/sandbox/MIPs/1-17-20_sanger/test_all/bowtie2-2.4.0-beta-linux-x86_64/bowtie2-align-s-debug 5601f0155000-5601f015a000 rw-p 00000000 00:00 0 5601f0b9a000-5601f0c35000 rw-p 00000000 00:00 0 [heap] 7f6d20000000-7f6d20974000 rw-p 00000000 00:00 0 7f6d20974000-7f6d24000000 ---p 00000000 00:00 0 7f6d26120000-7f6d26121000 ---p 00000000 00:00 0 7f6d26121000-7f6d27123000 rw-p 00000000 00:00 0 7f6d27123000-7f6d272e6000 r-xp 00000000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d272e6000-7f6d274e5000 ---p 001c3000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274e5000-7f6d274e9000 r--p 001c2000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274e9000-7f6d274eb000 rw-p 001c6000 08:01 61646 /usr/lib64/libc-2.17.so 7f6d274eb000-7f6d274f0000 rw-p 00000000 00:00 0 7f6d274f0000-7f6d27505000 r-xp 00000000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27505000-7f6d27704000 ---p 00015000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27704000-7f6d27705000 r--p 00014000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27705000-7f6d27706000 rw-p 00015000 08:01 3498 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7f6d27706000-7f6d27807000 r-xp 00000000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27807000-7f6d27a06000 ---p 00101000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a06000-7f6d27a07000 r--p 00100000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a07000-7f6d27a08000 rw-p 00101000 08:01 61655 /usr/lib64/libm-2.17.so 7f6d27a08000-7f6d27a0a000 r-xp 00000000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27a0a000-7f6d27c0a000 ---p 00002000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0a000-7f6d27c0b000 r--p 00002000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0b000-7f6d27c0c000 rw-p 00003000 08:01 61653 /usr/lib64/libdl-2.17.so 7f6d27c0c000-7f6d27c23000 r-xp 00000000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27c23000-7f6d27e22000 ---p 00017000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e22000-7f6d27e23000 r--p 00016000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e23000-7f6d27e24000 rw-p 00017000 08:01 61680 /usr/lib64/libpthread-2.17.so 7f6d27e24000-7f6d27e28000 rw-p 00000000 00:00 0 7f6d27e28000-7f6d27e4a000 r-xp 00000000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d28023000-7f6d28027000 rw-p 00000000 00:00 0 7f6d2803c000-7f6d28049000 rw-p 00000000 00:00 0 7f6d28049000-7f6d2804a000 r--p 00021000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d2804a000-7f6d2804b000 rw-p 00022000 08:01 61639 /usr/lib64/ld-2.17.so 7f6d2804b000-7f6d2804c000 rw-p 00000000 00:00 0 7ffc7c53a000-7ffc7c55b000 rw-p 00000000 00:00 0 [stack] 7ffc7c599000-7ffc7c59b000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] (ERR): bowtie2-align died with signal 6 (ABRT)
I published a new beta with a fix for this issue. Can you give it a go and let me know if it has been resolved?
This seems to have fixed it !! Thank you so much!
These changes have been released as part of v2.4.0
I get the same error when running HiC-Pro (https://github.com/nservant/HiC-Pro), which depends upon bowtie2. Strangely the error appears to occur after successful mapping of the reads. The following info appears in the log file:
536273614 reads; of these: 536273614 (100.00%) were unpaired; of these: 54708870 (10.20%) aligned 0 times 392765431 (73.24%) aligned exactly 1 time 88799313 (16.56%) aligned >1 times 89.80% overall alignment rate (ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)
So the reads seem to have mapped, yet this error occurs and halts the HiC-Pro pipeline. These are the bowtie2 options I specify in HiC-Pro:
BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end –reorder
Like mentioned here, I tried installing the “new-beta” version of bowtie2 (bowtie2-2.4.0-beta-linux-x86_64.zip), but unfortunately this does not prevent the error.
I still have this error in 2.4.2 when using more than 2 threads in macOS. I am wondering how to fix it...
Jianshu
@jianshu93 can you post the logs?
Error message: (ERR): bowtie2-align died with signal 11 (SEGV)
Hello,
Can you try running the bowtie2 again in debug mode and post the error? That will help tremendously.
Thank you
On Feb 18, 2021, at 4:15 PM, Jianshu_Zhao notifications@github.com wrote:
Error message: (ERR): bowtie2-align died with signal 11 (SEGV)
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.
that is all i have
In that case:
What is your command line? What’s the size of the index you’re using? How much memory is available on the host machine?
On Feb 18, 2021, at 4:18 PM, Jianshu_Zhao notifications@github.com wrote:
that is all i have
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi I'm using ver 2.4.2 and I still get this error. Here's --version of my bowtie2
bowtie2 --version /share/apps/bowtie2/2.4.2/bowtie2-align-s version 2.4.2 64-bit Built on fcc614744c04 Tue Oct 6 03:06:29 UTC 2020 Compiler: gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) Options: -O3 -msse2 -funroll-loops -g3 -g -O2 -fvisibility=hidden -I/hbb_exe_gc_hardened/include -ffunction-sections -fdata-sections -fstack-protector -D_FORTIFY_SOURCE=2 -fPIE -DPOPCNT_CAPABILITY -DWITH_TBB -std=c++11 -DNO_SPINLOCK -DWITH_QUEUELOCK=1 Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
I'm not doing anything fancy. I'm just running the most basic command
bowtie2 -x refindex -1 SRR2029441_1.fastq.gz -2 SRR2029441_2.fastq.gz -S out.sam
And here's the message I get with --debug
Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2. bowtie2-align-l-debug: word_io.h:125: T readU(FILE*, bool) [with T = unsigned int; FILE = _IO_FILE]: Assertion `false' failed. (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)
I'm using a FASTQ that's not trimmed or QC'd or anything. Its hot off converting to a FASTQ from SRA
Thank you for the help!
Hello,
This error is still there: (ERR): bowtie2-align died with signal 11 (SEGV)
when using more than 1 thread. I am so surprised that it still not fixed after 3 major release... It is slow with 1 thread for a median size dataset. bwa-mem2 is way faster and very close results in terms of average depth after filtering. Appearently is is related to threads
Thanks,
Jianshu
I have tested on 3 system (macOS, ubuntu and centos). I got the same error.
Jianshu
@cjy8709 -- Thank you for the detailed issue. There may have been an issue with the index that you built that caused the short read. In any event I have gone ahead and committed a change to bug_fixes
to print an error message and exit if there are any short reads in the affected functions.
@jianshu93 -- Could you provide something extra to help debug this issue e.g.
I am getting the same error, using Bowtie2.4.4. Neither using different indices nor running multi/single-threaded produces a different result. But it seems to be specific to aligning unmapped bams; I've never encounted this problem when aligning fastqs.
bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x hg38.analysisSet -b unmapped.bam -S mapped.sam
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)
If I convert the ubam to an interleaved fastq first and then pipe the output to bowtie2, it works just fine.
samtools-1.13/samtools fastq -@ 8 -n -T [taglist] unmapped.bam | bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --sam-append-comment --very-fast --no-discordant --no-mixed -x hg38.analysisSet --interleaved - -S mapped.sam
Hello @eboyden,
I tried a few runs with a command line similar to yours but have yet to experience a segfault. Would you be willing to share your input file or run bowtie2 in debug mode so that we can get more information on exactly where the crash occurs?
Hi @ch4rr0,
I can't share my original file, but I tried again with a 1000g NA12878 exome (SRR1518133 at https://www.internationalgenome.org/data-portal/sample/NA12878) and produced the same result, using the hg38 noalt+decoy pre-built reference from the Bowtie2 website. Aligning the fastqs directly works fine, but fails after converting them to a ubam with Picard FastqToSam. Using --debug produces no errors other than the signal 11.
time /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 --debug --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam -S SRR1518133.bam.bt2_vf_ndnm.sam
Warning: Running in debug mode. Please use debug mode only for diagnosing errors, and not for typical use of Bowtie 2.
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)
real 0m16.870s
user 0m0.116s
sys 0m11.695s
time /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -1 SRR1518133_1.fastq.gz -2 SRR1518133_2.fastq.gz -S SRR1518133.fastq.bt2_vf_ndnm.sam
2539611 reads; of these:
2539611 (100.00%) were paired; of these:
100233 (3.95%) aligned concordantly 0 times
1999397 (78.73%) aligned concordantly exactly 1 time
439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate
real 1m59.187s
user 14m20.234s
sys 1m8.956s
Thank you for the additional information. I have been able to recreate the issue using a BAM file produced by picard's FastqToSAM. I will get started on a fix ASAP.
I have committed a fix to the bug_fixes
branch that I think resolves the issue. We made the assumption that alignment data would not span BGZF blocks in the old code. The picard-produced BAM files proved that assumption wrong so I updated the code to make it more robust.
FYI -- there are still a few edge cases that need cleaning up before these changes are considered final.
@eboyden -- I think the changes are now complete and available on the bug_fixes
branch. Are you willing to help test?
Absolutely, will let you know how it goes - thanks for the quick work
It seems to run now. Oddly it finds 0 reads if the reads are flagged as paired but --align-paired-reads
isn't specified (or the opposite); is this intentional? Unless I'm missing something, wouldn't it be preferable to just align paired-flagged reads in single-end mode, and reset the bitflag? Or better yet, assume name-sorted input and attempt to align in paired mode if adjacent records have the same read name (save for a trailing /1 /2 suffix) and the appropriate bitflags, else treat reads as single-end. A replacement option --align-single-reads
would reset all bitflags and perform all alignments in single-end mode. This would allow using bams that are a mix of single- and paired-end reads. (Understood if this is beyond the scope of this bugfix.)
# aligning paired ubam after conversion to interleaved fastq and piped
/tools/samtools/samtools-1.13/samtools fastq -@ 8 -n -o /dev/stdout SRR1518133.bam | /tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as --interleaved - -S /dev/null
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 5079222 reads
2539611 reads; of these:
2539611 (100.00%) were paired; of these:
100233 (3.95%) aligned concordantly 0 times
1999397 (78.73%) aligned concordantly exactly 1 time
439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate
# aligning paired ubam in non-paired mode
/tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam -S /dev/null
0 reads
0.00% overall alignment rate
# aligning paired ubam in paired mode
/tools/bowtie2/bowtie2-bug_fixes/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b SRR1518133.bam --align-paired-reads -S /dev/null
2539611 reads; of these:
2539611 (100.00%) were paired; of these:
100233 (3.95%) aligned concordantly 0 times
1999397 (78.73%) aligned concordantly exactly 1 time
439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate
But I found another issue: it doesn't appear to be able to accept piped input using the "-" shorthand (unlike other input options). The original error I encountered was actually with piped bam input, and that error mode still seems to be present. But /dev/stdin
works; and even works with 2.4.4. So my original issue might have been a completely separate one from the one you fixed. Presumably piped input isn't subject to the same BGZF block problem you identified?
/tools/samtools/samtools-1.13/samtools view SRR1518133.bam -b | /tools/bowtie2/bowtie2-bug_fixes/bowtie2 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b - --align-paired-reads -S /dev/null
(ERR): bowtie2-align died with signal 11 (SEGV) (core dumped)
/tools/samtools/samtools-1.13/samtools view -@ 8 SRR1518133.bam -b | /tools/bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 -p 8 --very-fast --no-discordant --no-mixed -x /tools/bowtie2/index/GRCh38_noalt_decoy_as/GRCh38_noalt_decoy_as -b /dev/stdin --align-paired-reads -S /dev/null
2539611 reads; of these:
2539611 (100.00%) were paired; of these:
100233 (3.95%) aligned concordantly 0 times
1999397 (78.73%) aligned concordantly exactly 1 time
439981 (17.32%) aligned concordantly >1 times
96.05% overall alignment rate
Interesting: when I write a ubam with fgbio CallMolecularConsensusReads
or picard FastqToSam
and then use samtools view -b/u
to pipe it to bowtie2-2.4.4 -b /dev/stdin
it runs fine; but when I pipe the output from either CallMolecularConsensusReads
or FastqToSam
directly to bowtie2-2.4.4
it crashes. However it doesn't crash when I pipe from either software directly to bowtie2-bug_fixes
. So perhaps it depends on how the upstream software handles writing bams to stdout? In any case, I have not been able to make the bug_fixes
branch crash however I've run it.
I think I found a new bug - not sure if I should open a new issue or continue this thread since it was only exposed by the previous fix.
When I align a ubam generated by fgbio CallMolecularConsensusReads
using bowtie2-bug_fixes -b /dev/stdin --align-paired-reads
and then pipe that to samtools, it works fine. But if I add the --preserve-tags
option, it produces the following error:
[E::aux_parse] incomplete aux field
[W::sam_read1_sam] Parse error at line 3084
samtools sort: truncated file. Aborting
Of note, fgbio-generated ubams include a couple of gnarly tags; a typical line looks like this:
NA12878:1000009 77 * 0 0 * * 0 0 TATCATCTGCATCTCTCGACTTCGTTCTACCCGAATCCATTTCCCCCGATACCTGAATAAGAACGATCAAAACTGAGTGAGTGAATGGGTCAAACCCAGCTCCCATCATTCTTTTCTACTCTCACAGCC DDDEDEDEEEDDDEEEEEEEEDDCDDDEEEEDCDEEEEEEEDEEEEED>DEEDEEDDDDDEEDEEEEEEEDDECD>DEECBDDD0DDDEEEDEEEEEEEDDEDDEDEEEEEEEDEEEEDEEEECECDDE cD:i:1 cE:f:0 RG:Z:A MI:Z:1000009 cM:i:1 RX:Z:CCACAGACAA cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
When I align without --preserve-tags
, it looks fine, like so:
NA12878:1000009 83 chr2 218661329 42 129M = 218661308 -150 GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:129 YS:i:0 YT:Z:CP
But with --preserve-tags
, it looks like this:
NA12878:1000009 83 chr2 218661329 42 129M = 218661308 -150 GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:129 YS:i:0 YT:Z:CP cD:i:1 cE:f: ^@^@:^@: ^@R:G: ZA:^@: MI:Z:1000009 cM:i:1 RX:Z:CCACAGACAA cd:B: s<81>:^@: ^@^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@:
^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A:
^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@:
^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A:
^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@:
^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A:
^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@:
^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:^A: ^@^A:^@: ^A^@:i:101
Bs:<81>: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@: ^@^@:^@:
Samtools is able to parse the fgbio ubams directly without any issues, so it seems like Bowtie2 is doing something. And if I convert the ubam to a commented interleaved fastq using samtools fastq -T
and pipe it to bowtie2 with --append-sam-comment
, then the output looks as expected, like so:
NA12878:1000009 83 chr2 218661329 42 129M = 218661308 -150 GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:129 YS:i:0 YT:Z:CP cD:i:1 cE:f:0 RG:Z:A MI:Z:1000009 cM:i:1 RX:Z:CCACAGACAA cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Given the amount of code sharing between Bowtie2 and Hisat2, I wonder if it could be related to this issue: https://github.com/DaehwanKimLab/hisat2/issues/316
Hello @eboyden,
Bowtie2 not producing any output when trying to align paired-end reads, in BAM format, without the --align-paired-reads
is intentional. I am contemplating your suggestion of automatically detecting and aligning paired/single end reads, but that change is currently lower in priority. I am looking into the issue with --preserve-tags
option.
Thank you for helping test and for the detailed bug reports.
I expanded on the code for handling aux data. Here's a sample run:
./bowtie2-align-s -x hg19 -b output.bam --preserve-tags --sam-nohead
1 reads; of these:
1 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
1 (100.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
100.00% overall alignment rate
NA12878:1000009 16 chr2 219526052 42 129M * 0 0 GGCTGTGAGAGTAGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGGGTAGAACGAAGTCGAGAGATGCAGATGATA EDDCECEEEEDEEEEDEEEEEEEDEDDEDDEEEEEEEDEEEDDD0DDDBCEED>DCEDDEEEEEEEDEEDDDDDEEDEED>DEEEEEDEEEEEEEDCDEEEEDDDCDDEEEEEEEEDDDEEEDEDEDDD AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:129 YT:Z:UU cD:i:1 cE:f:0.000000 RG:Z:A MI:Z:1000009 cM:i:1 RX:Z:CCACAGACAA cd:B:s:1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ce:B:s:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
I still need to add support for processing :H:
type tags to be fully compliant with the spec. We currently convert [cCsSiI]
to i
so that we are compliant with the types specified in the 'Sequence Alignment/Map Optional Fields Specification'.
The code is available in the bug_fixes
branch.
Thanks @ch4rr0 - I tried it on my ubam and the output from bowtie2 looked consistent with yours; but when I piped it to samtools fixmate I got the following error:
[E::aux_parse] B aux field type not followed by ','
samtools fixmate: Couldn't read from input file
[E::aux_parse] B aux field type not followed by ','
Samtools fixmate can read the ubam directly. Looking at the bowtie2 output I see this:
NA12878:1000009 99 chr2 218661341 42 87M = 218661341 87 AGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGG EEEDEDDEDEEEEEDDEEEEEEEEDDDDEEEEEEEDDDDDDCDEDDDDEEEEDCDCDDDDEEEEEEDCDDEABDCAAD;ACDDDDEE AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:87 YS:i:0 YT:Z:CP cD:i:1 cE:f:0.000000 RG:Z:A MI:Z:1000009 cM:i:1 RX:Z:ATAACAGCGC cd:B:s:1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ce:B:s:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Whereas the ubam looks like this:
NA12878:1000009 77 * 0 0 * * 0 0 AGAAAAGAATGATGGGAGCTGGGTTTGACCCATTCACTCACTCAGTTTTGATCGTTCTTATTCAGGTATCGGGGGAAATGGATTCGG EEEDEDDEDEEEEEDDEEEEEEEEDDDDEEEEEEEDDDDDDCDEDDDDEEEEDCDCDDDDEEEEEEDCDDEABDCAAD;ACDDDDEE cD:i:1 cE:f:0 RG:Z:A MI:Z:1000009 cM:i:1 RX:Z:ATAACAGCGC cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
I think the problem is that in the cd and ce tags, there should be a comma after "B:s" rather than a colon.
Unrelated, I'd suggest adding support for -b -
to be consistent with other file input and output options. /dev/stdin
works fine, but if one didn't know better one might assume that it's not possible to pipe input with -b
, as I did at first.
Hello @eboyden,
I pushed a couple of changes to bug_fixes
that should resolve both issues. Please let me know if they work for you.
Thanks
Everything seems to work properly. Thanks again, really appreciate the quick work.
Hi,
I have incorporated Bowtie2 into my Hi-C pipeline HiCUP (https://www.bioinformatics.babraham.ac.uk/projects/hicup/). I have received an email from a HiCUP user stating that Bowtie2 fails when using HiCUP. The error message is: “(ERR): bowtie2-align died with signal 11 (SEGV)”
When I try running their data on my system, Bowtie2 functions correctly. Do you know what typically is the reason for this error message? This may help me troubleshoot the problem.
Many thanks, Steven