Open eric9n opened 8 months ago
// Return only if we've read in at least one k-mer's worth of chars
if (str_pos_ >= (size_t) k_) {
break;
}
If the above if statement is true, it may occur that after k characters, each time the candidatelmer obtained will only be compared with the first value in queue. If the value in queue_ is smaller, then actually changed_minimizer is false, which means that the minimizer value has not changed. At this point, the NextMinimizer function returns the old value, which is not very meaningful for library construction or classification calculations. Instead, it repeats calculations and wastes computing resources.
For example, the output result in the test file mmtest.cc
#include "kraken2_headers.h"
#include "mmscanner.h"
using namespace std;
using namespace kraken2;
int main() {
string seq = "ACGATCGACGACG";
MinimizerScanner scanner(10, 5, 0);
scanner.LoadSequence(seq);
uint64_t *mmp;
while ((mmp = scanner.NextMinimizer()) != nullptr)
printf("%016x\n", *mmp);
return 0;
}
$ ./mmtest
00000000000002d8
0000000000000218
0000000000000218
0000000000000218
The last three lines are duplicate data, which has no impact on the mapping result of taxid and only increases the computational load.
When such a scenario occurs, the same minimum sequence can result in multiple hits, leading to an increase in false positives.
https://github.com/DerrickWood/kraken2/blob/acc22481bbeb32b870c12b403ce595fe7a6db770/src/mmscanner.cc#L181C35-L181C35
When the first k characters of seq are all fuzzy characters, if "strpos >= (sizet) k)" exits the loop, will it cause there to be no multiple candidate_lmers in the queue for comparison, and directly return the first candidate_lmer instead of the smallest one?
` // Sliding window minimum calculation while (! queue.empty() && queue.back().candidate > candidatelmer) queue.pop_back(); MinimizerData data = { candidate_lmer, queuepos }; if (queue_.empty() && queuepos >= k - l) { // Empty queue means front will change // Minimizer will change iff we've processed enough l-mers changedminimizer = true; } queue.pushback(data); // expire an l-mer not in the current window if (queue.front().pos < queuepos - k + l) { queue.erase(queue.begin()); // Change in front means minimizer changed changed_minimizer = true; }