In the documentation it says that ambiguous bases are excluded for the purpose of calculating confidence score.
But I've been getting some slight errors when running kraken2, which led me to try doing some simple print statements.
Example:
I ran kraken2 with --confidence 0.1 and got a read classified as follows
I'm not sure why the taxon are different (I assume a different number is used internally). But the required score is clearly using the total number of kmers which includes A's.
On consideration. I think it I prefer the way it is currently done in the code (including ambiguous kmers in total count), so more sensible to change the docs
In the documentation it says that ambiguous bases are excluded for the purpose of calculating confidence score. But I've been getting some slight errors when running kraken2, which led me to try doing some simple print statements.
Example: I ran kraken2 with
--confidence 0.1
and got a read classified as followsWhich has counts: 0: 193 1: 1 570:3 9606: 3 A: 34 so total is 200, or 234 with Ambiguous bases included.
I put print statement in classify.cc at line 439 in ResolveTree
The output is:
I'm not sure why the taxon are different (I assume a different number is used internally). But the required score is clearly using the total number of kmers which includes A's.
Is this intended or a bug?