I've classified the same dataset against the same database with the only difference being that i've used different confidence score settings. I've run with confidence 0.0 (default), 0.1, 0.2, 0.5, and 0.9.
The number of classified reads drop with increased confidence score, as expected. However, the numbers in the columns representing the total number of minimizers in the read data and the total number of distinct minimizers in the read data (columns 4 and 5, respectively) are identical between classifications as far as I can see.
I would have expected the number of minimizers (both total and unique) to drop when the number of classified reads drop. Have I perhaps misunderstood something about the output or could this be a bug?
I'm attaching the report files from the classifications so you can have a look yourself.
I'm not one of the authors but the way I understand the minimizer reported two possible explanations are:
Even though the number of sequencing reads mapped decreases due to the confidence parameter, there are still enough sequences that they span the same minimizer space (and distinct minimizers).
Another possibility is that the minimizers are reported before the confidence threshold is applied which would be unintuitive to say the least.
I've classified the same dataset against the same database with the only difference being that i've used different confidence score settings. I've run with confidence 0.0 (default), 0.1, 0.2, 0.5, and 0.9.
The number of classified reads drop with increased confidence score, as expected. However, the numbers in the columns representing the total number of minimizers in the read data and the total number of distinct minimizers in the read data (columns 4 and 5, respectively) are identical between classifications as far as I can see.
I would have expected the number of minimizers (both total and unique) to drop when the number of classified reads drop. Have I perhaps misunderstood something about the output or could this be a bug?
I'm attaching the report files from the classifications so you can have a look yourself.
I'm running version 2.1.2.
Cheers, /Daniel
confidence_0p0_report.txt confidence_0p1_report.txt confidence_0p2_report.txt confidence_0p5_report.txt confidence_0p9_report.txt