I'm using Hisat2 and found a few near exact reads that don't align, and I can't tell if it's due to a bug or a shortcoming of the algorithm.
The reads and closest matches are shown at the end of this report for readability. I'm using the prebuilt grch38_snptran.tar.gz index and confirmed that it contains the reference shown below with hisat2-inspect.
When I use the index without SNPs and transcripts (grch38_genome.tar.gz), both reads get uniquely aligned, so presumably it has to do with an increase in potential mapping locations. Based on this issue I've set the options --max-seeds and --max-altstried to very generous values but that didn't change anything. To reduce the number of potential locations, I've created an index grch38_chr12_snptran, which is grch38_snptran but only contains chr12. With this index, Hisat2 still would not align the first example (I have not performed this experiment for example 2 and chr13).
Adding the option --bowtie2-dp 2 makes Hisat2 align the first example to chr12:54284611 with index grch38_snptran, but not with grch38_chr12_snptran (no alignment is found). Adding --score-min L,0,-1 in addition to --bowtie2-dp 2 resolves that too. Upon closer inspection, setting --score-min L,0,-1 makes align() (hi_aligner.h:5551) find an extra anchor hit after which --bowtie2-dp 2 allows hybridSearch() (spliced_aligner.h:142) to find the match.
For Example 2, none of these options work. I hope these small detailed examples help you pinpoint potential issues as I'm looking forward to using the snptran index.
I'm using Hisat2 and found a few near exact reads that don't align, and I can't tell if it's due to a bug or a shortcoming of the algorithm.
The reads and closest matches are shown at the end of this report for readability. I'm using the prebuilt grch38_snptran.tar.gz index and confirmed that it contains the reference shown below with hisat2-inspect.
When I use the index without SNPs and transcripts (grch38_genome.tar.gz), both reads get uniquely aligned, so presumably it has to do with an increase in potential mapping locations. Based on this issue I've set the options
--max-seeds
and--max-altstried
to very generous values but that didn't change anything. To reduce the number of potential locations, I've created an index grch38_chr12_snptran, which is grch38_snptran but only contains chr12. With this index, Hisat2 still would not align the first example (I have not performed this experiment for example 2 and chr13).Adding the option
--bowtie2-dp 2
makes Hisat2 align the first example to chr12:54284611 with index grch38_snptran, but not with grch38_chr12_snptran (no alignment is found). Adding--score-min L,0,-1
in addition to--bowtie2-dp 2
resolves that too. Upon closer inspection, setting--score-min L,0,-1
makes align() (hi_aligner.h:5551) find an extra anchor hit after which--bowtie2-dp 2
allows hybridSearch() (spliced_aligner.h:142) to find the match.For Example 2, none of these options work. I hope these small detailed examples help you pinpoint potential issues as I'm looking forward to using the snptran index.
Example 1 read:
Closest matches within the reference to example 1:
Example 2 read
Closest match to example 2: