TravisWheelerLab / AvxWindowFmIndex

A fast, AVX2 and ARM Neon accelerated FM index library
BSD 3-Clause "New" or "Revised" License
30 stars 3 forks source link

Another 0-count from k-mer generated from sequence used in Index #43

Closed EricR86 closed 2 months ago

EricR86 commented 2 months ago

Hello,

I've managed to quickly find another k-mer that has a reported 0 counts from awFmParallelSearchCount on an index of mm39, on chr19 starting at the very first non-ambiguous location in the sequence:

GATCATACTCCTCATGCTGGACATTCTGGTTCCTAGTATATCTGGAGAGTTAAGATGGGGAATTATGTCAACTTTCCCTCTTCCTATGCCAGTTATGCATAATGCACAAATATTTCCACGCTTTTTCACTACAGATAAAGAACTGGGACTTGCTTATTTACCTTTAGATGAACAGATTCAGGCTCTGCAAGAAAATAGAATTTTCTTCATACAGGGAAGCCTGTGCTTTGTACTAATTTCTTCATTACAAGATAAGAGTCAATGCATATCCTTGTATAATCAATCATCAGCCTGGTTAAAAAGGGCCTCTTTTGGCGGAGTGGCTTCTGATATAAAGGCTGATACCATAGTTATTAGAAGATGGTGGCCTAGCCAAAAACCTGGAAATAAGCCACCTTCACAACCAAGAATGGCACCCTTTTGCAAGGAGGGGTCCATGGAATATGTAATGTTACCTTGGACAGGTTGCCAAGCTAAAAAATATACTTGGGCAGTAGAGAAA

This may still be related to Issue #41. It uses the exact same data as specified in that issue.

EricR86 commented 2 months ago

A much shorter k-mer from the same position also has a count of 0: GATCATACTCCTCATGCTGGACATTCTGGTTCCTAGTATATCTGGAGAGTTAAGATGGGGAATTATGTCAACTTTCCCTCTTCCTATGCCAGTTATGCATAATGCACAAATATTTCCACGCTTTTTCACTACAGATAAAGAACTGGGACTTGCTTATTTACCTTTAGATGAACAGATTCAGGCTCTGCAAGAAAATAGAATTTTCTTCATACAGGGAAGCCTGTGCTTTGTACTAATTTCTTCATTACAAGATAAGAGTCAATGCATATCC

And another slightly longer from the same position but still much shorter: GATCATACTCCTCATGCTGGACATTCTGGTTCCTAGTATATCTGGAGAGTTAAGATGGGGAATTATGTCAACTTTCCCTCTTCCTATGCCAGTTATGCATAATGCACAAATATTTCCACGCTTTTTCACTACAGATAAAGAACTGGGACTTGCTTATTTACCTTTAGATGAACAGATTCAGGCTCTGCAAGAAAATAGAATTTTCTTCATACAGGGAAGCCTGTGCTTTGTACTAATTTCTTCATTACAAGATAAGAGTCAATGCATATCCTTGTATAATCA

Sawwave commented 2 months ago

Okay, I'll look into these immediately. Thank you for providing multiple kmers, it'll help me narrow down the issue.

Sawwave commented 2 months ago

This issue has been fixed with PR #44 . There were still a few places where 8-bit integers were still being used for positional variables. These have all been replaced with size_t variables, which will allow AwFmIndex to support all kmer lengths on 64-bit architectures.