Bowtie clips read before matching

ferayd commented 5 years ago

Hello,

I am trying to match a list of reads against the Wheat genome using Bowtie 1.2.2.

The Wheat genome can be found here: https://wheat-urgi.versailles.inra.fr/Seq-Repository/Assemblies or here: ftp://ftp.ensemblgenomes.org/pub/plants/release-42/fasta/triticum_aestivum/dna/ The genome is very large, so a large index must be built.

My Bowtie command is like this: bowtie --all -v 0 -m 100 -r

For many reads, I see that Bowtie clips one nucleotide from the beginning of the read, and matches the clipped read against the genome. For example, my original read is this: AAAAAAAAAAAGCGTGACTGATGTTTGAAGAAGG In the output_file, I see this: 816 + chr13 569873877 AAAAAAAAAAGCGTGACTGATGTTTGAAGAAGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 0

One "A" has been clipped from the read, before matching. But in the genome, this position actually contains the entire read, including the first "A" (at position 569873876).

If I put only this single read into the srna_reads file, then Bowtie matches the full read without any problems (at position 569873876).

This clipping problem was not happening with Bowtie 0.12. Or maybe the problem is with large indexes only.

Doesn't Bowtie 1 do end-to-end alignment?

Thanks

blakemeyers commented 5 years ago

Just to follow up on Feray's comment above, it would be really good if someone could help and address this bug. We're wondering why it seems to fail with the wheat genome - whether it's something about the size of that genome that is causing this unexpected behavior by Bowtie. It is causing problems for our analyses.

thanks, Blake

ch4rr0 commented 5 years ago

We've committed a fix for this issue. The fix currently addresses raw reads. There will be additional commits if/when we discover that other pattern sources are also affected.

ferayd commented 5 years ago

Thanks for the fix. We tested it and it works fine. Are you planning to release a new version of Bowtie? Because if you don't make a release, your users will have to compile from source code every time.

ch4rr0 commented 5 years ago

Yes, I have a release planned for July 5th.

BenLangmead / bowtie

Bowtie clips read before matching #92