Closed ferayd closed 2 years ago
I am not able to get bowtie-build
to successfully build the index, but if I use a bowtie2
index then the alignment runs successfully.
I am investigating why bowtie-build
is not able to build the index.
./bowtie-align-l -x ../bowtie2/triticum -r reads.raw
0 + CM022213.1 49156482 AAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIII 21146
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
I am also wondering about the status of the fix to this problem, as we have multiple large genomes that we need to use - the same scale as wheat (i.e. rye, barley, oat). Any updates on either the original problem (hanging during mapping on the indexed genome) or the second problem (failure to build the index)?
thank you!
I am still actively looking into this one, but have yet to figure out the underlying cause of this bug. As a temporary work around you can build those indexes with bowtie2 and use them for alignment in bowtie.
This work around would only address the secondary issue that you encountered with indexing, but not the original problem of bowtie hanging during the mapping/aligning stage, right? I don't think the problem originally reported was with indexing.
I am not able to get
bowtie-build
to successfully build the index, but if I use abowtie2
index then the alignment runs successfully.I am investigating why
bowtie-build
is not able to build the index../bowtie-align-l -x ../bowtie2/triticum -r reads.raw 0 + CM022213.1 49156482 AAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIII 21146 # reads processed: 1 # reads with at least one alignment: 1 (100.00%) # reads that failed to align: 0 (0.00%) Reported 1 alignments
I am quite confident it is an index issue.
We have finally tracked down and pushed a fix for this bug to the bug_fixes branch. We thank all of you who have been impacted by this issue for your patience, and are in the process of putting together an official release which will include this change.
./bowtie-build-l GCA_002220415.3_Triticum_4.0_genomic.fna triticum --threads 12 --packed
...
Wrote 4422321688 bytes to primary EBWT file: triticum.1.ebwtl
Wrote 3849428340 bytes to secondary EBWT file: triticum.2.ebwtl
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 15397713314
bwtLen: 15397713315
sz: 3849428329
bwtSz: 3849428329
lineRate: 7
linesPerSide: 1
offRate: 5
offMask: 0xffffffffffffffe0
isaRate: -1
isaMask: 0xffffffff
ftabChars: 10
eftabLen: 20
eftabSz: 160
ftabLen: 1048577
ftabSz: 8388616
offsLen: 481178542
offsSz: 3849428336
isaLen: 0
isaSz: 0
lineSz: 128
sideSz: 128
sideBwtSz: 112
sideBwtLen: 448
numSidePairs: 17184948
numSides: 34369896
numLines: 34369896
ebwtTotLen: 4399346688
ebwtTotSz: 4399346688
reverse: 0
...
Wrote 4422321688 bytes to primary EBWT file: triticum.rev.1.ebwtl
Wrote 3849428340 bytes to secondary EBWT file: triticum.rev.2.ebwtl
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 15397713314
bwtLen: 15397713315
sz: 3849428329
bwtSz: 3849428329
lineRate: 7
linesPerSide: 1
offRate: 5
offMask: 0xffffffffffffffe0
isaRate: -1
isaMask: 0xffffffff
ftabChars: 10
eftabLen: 20
eftabSz: 160
ftabLen: 1048577
ftabSz: 8388616
offsLen: 481178542
offsSz: 3849428336
isaLen: 0
isaSz: 0
lineSz: 128
sideSz: 128
sideBwtSz: 112
sideBwtLen: 448
numSidePairs: 17184948
numSides: 34369896
numLines: 34369896
ebwtTotLen: 4399346688
ebwtTotSz: 4399346688
reverse: 0
ls triticum*.ebwtl
triticum.1.ebwtl triticum.3.ebwtl triticum.rev.1.ebwtl
triticum.2.ebwtl triticum.4.ebwtl triticum.rev.2.ebwtl
./bowtie-align-l -x triticum -c AAAAAAAAAAAAAAAAAAAA
0 + CM022213.1 49156482 AAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIII 21146
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
Thank you so much! I really appreciate this. We'll test it out and let you know if we encounter any issues.
This change is now available in v1.3.1. Thank you for providing sample files and for helping test.
I downloaded the "bread wheat" genome from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/220/415/GCA_002220415.3_Triticum_4.0/GCA_002220415.3_Triticum_4.0_genomic.fna.gz
Then I ran bowtie-build:
bowtie-build GCA_002220415.3_Triticum_4.0_genomic.fna WHEAT_JHU4_genome
Then I created a query input file (raw) called reads.txt, with a single sequence in it. For example:
AAAAAAAAAAAAAAAAAAAA
Then I ran bowtie:
bowtie -x WHEAT_JHU4_genome -r reads.txt
It hangs. If I split the genome to two pieces, it works well for each piece. So I think this problem is because of the size of the genome.
I am attaching the verbose output. It shows where bowtie hangs: bowtie_verbose.txt
Thanks