Open NienkeMekkes opened 3 years ago
What is the command you are using?
For running Kraken2, I typically use: kraken2 reads/ --db krakendb --paired --output sampleID_kraken_output.txt --report sampleID_kraken_report.txt --report-minimizer-data. The mentioned row for bacteroides fragilis looks like:
20.62 458469 442261 21635904 1372051 S 817 Bacteroides fragilis
For kraken2-inspect, I use: kraken2-inspect --db krakendb. The mentioned row for bacteroides fragilis
0.03 302714 290736 S 817 Bacteroides fragilis
Seems like a duplicate of #392
can confirm this, reading the source code is a bit confusing since the option is referred to as "report kmer data" vs minimizer, maybe the number is indeed the number of assigned k-mers? or does it maybe also count distinct minimizers even if they don't belong to the taxon a read was assigned to?
as a suggestion it could also be helpful to output minimizers/unique minimizers at node level in addition to the subtree rooted at a specific node (this can be calculated from the subtree or bottom up for the entire tree obviously).
looks like this would be any minimizer found in the read even if it's not matching the taxon that gets assigned as the final classification?
Dear authors,
The new --report minimizer-data is a very promising feature! I do have a question about it. When I run kraken2-inspect on my database, I find one column which is: "amount of database minimizers that map to a taxon rooted in this clade". When I run kraken2 with --report-minimizer-data, I find that the estimate in the distinct minimizer column can be higher than this inspect value. I expected that the inspect value would be the maximum number of distinct minimizers that you can find at that clade. Why is this not the case?
Thanks
For example; in my database ~300.000 minimizers are rooted at S bacteroides fragilis. In my kraken2 output, I found 1.370.00 distinct minimizers for S bacteroides fragilis.