Open kuldeepmore10 opened 5 months ago
this is easy: both answers are correct. Your database only has protists, so Kraken cannot recognize this as Bordetella, which is the correct ID. So it gives you the species that has at least one 31-bp exact match, even though it's not a great match otherwise.
Thats what I though. But does that mean that I only want to analyse protists, I will have to build all (say bacteria, archaea, plants, etc) inclusive database every-time? Furthermore, I thought Kraken had species specific k-mer and will identify the read belonging to that species only if that k-mer is detected. But that maybe my misunderstanding.
It's true that Kraken identifies species-specific k-mers when it builds the database. But it can only do that for species that you give it at the time you build the DB. So if a k-mer is shared between a protist and a bacterium, it will be classified at the lowest common ancestor of those species. However if you only give it the protist genomes, then the k-mer could appear to be specific to one of the protists. So yes, you have to build an inclusive database. Note that our Microbial2023 database is very large and inclusive already. So you could use that first, and then take all the 'unclassified' reads from the output, and run those against a 2nd, protist-only database. Microbial2023 has a few protists in it, so you'd have to merge the results after.
This explains a lot. Thank you very much :) I am now building bacteria plus protozoa database. I will post here if it works out fine so that the thread can be closed :)
Hello Kraken team,
I am analysing shotgun data using Kraken2. So I built only protist database using refseq genomes from NCBI. I used different confidence levels (0 to 0.7) and increased --minimum-hit-groups to 4 for testing. But there is whole lot of discrepancy in Kraken classification and what I find when I BLAST the classified sequence.
For example, in 0.5 confidence level, @LH00328:56:22FFHJLT3:5:1106:14351:2442 1:N:0:GCTATCCT+AACAGGTG kraken:taxid|1093141 TTTGCCGAGTTCCTTCTCCTGAGTTCTCTCAAGCGCCTTGGAATATTCATCCCGTCCACCTGTGTCGGTTTGCGGTACGGTCTCGTACAGCTGAAGCTTAGAGGCTTTTCTTGGAACCACTTCCAATCACTTCGCGAAACAAGTTCGCTC + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
This sequence when I BLAST, I get 100% identity to Bordetella, but Kraken classifies this as Nannochloropsis gaditana. Now this is just one example, there several others. Am I doing anything wrong? Whats the solution for this?