Closed ferayd closed 5 years ago
Just to follow up on Feray's comment above, it would be really good if someone could help and address this bug. We're wondering why it seems to fail with the wheat genome - whether it's something about the size of that genome that is causing this unexpected behavior by Bowtie. It is causing problems for our analyses.
thanks, Blake
We've committed a fix for this issue. The fix currently addresses raw reads. There will be additional commits if/when we discover that other pattern sources are also affected.
Thanks for the fix. We tested it and it works fine. Are you planning to release a new version of Bowtie? Because if you don't make a release, your users will have to compile from source code every time.
Yes, I have a release planned for July 5th.
Hello,
I am trying to match a list of reads against the Wheat genome using Bowtie 1.2.2.
The Wheat genome can be found here: https://wheat-urgi.versailles.inra.fr/Seq-Repository/Assemblies or here: ftp://ftp.ensemblgenomes.org/pub/plants/release-42/fasta/triticum_aestivum/dna/ The genome is very large, so a large index must be built.
My Bowtie command is like this: bowtie --all -v 0 -m 100 -r
For many reads, I see that Bowtie clips one nucleotide from the beginning of the read, and matches the clipped read against the genome. For example, my original read is this: AAAAAAAAAAAGCGTGACTGATGTTTGAAGAAGG In the output_file, I see this: 816 + chr13 569873877 AAAAAAAAAAGCGTGACTGATGTTTGAAGAAGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0
One "A" has been clipped from the read, before matching. But in the genome, this position actually contains the entire read, including the first "A" (at position 569873876).
If I put only this single read into the srna_reads file, then Bowtie matches the full read without any problems (at position 569873876).
This clipping problem was not happening with Bowtie 0.12. Or maybe the problem is with large indexes only.
Doesn't Bowtie 1 do end-to-end alignment?
Thanks