COMBINE-lab / piscem-cpp

A small sparse and fast reference index based on SShash and Tiling encoding
MIT License
6 stars 1 forks source link

Print query trace #1

Closed jermp closed 2 years ago

jermp commented 2 years ago

https://github.com/jermp/sshash/issues/11#issuecomment-1172977081

jermp commented 2 years ago

The problem is that -- in the case with skipping -- even if the minimizer is the same and we extend successfully, the kmer_id might be wrong if we do not take into account the entity of the skip. This can solved, I think. Before we got the mysterious kmer_id=0 because that was the result of invalid_uint64+1. In fact, what was happening is that we had some kmers with the same minimizer, but one was not found (so the answer was all "invalid"), but a successive one was found by extension (because the m_minimizer_not_found flag was still false, hence, the +1). So we incremented the kmer_id by 1 but leave the rest of the answer invalid. In summary, the problem is that we could extend "by chance" if we skip some queries and miss the lookup_advanced in between. So the safest thing to do is to re-locate the bucket and do a lookup whenever we skip.