DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
698 stars 270 forks source link

Question with taxonomy and LCA #375

Open davidvilanova opened 3 years ago

davidvilanova commented 3 years ago

Hi, I have built a custom database for kraken2 with default parameters. When i align 16S nanopore reads with kraken2 I have the following lines in the output file.

image

I have 34 hits mapped directly to the Bacillus genus (first line) with ncbi ID 1386.

How come i have 34 hits that are directed assigned to this genus. Is that because of LCA failing to assign at species level ?

My guess is the total % of bacillus is overpresented in my dataset by kraken2.

Any thoughts ?

AGalanis97 commented 3 years ago

Hello,

try looking through the forums about appropriate parameters for nanopore/long-reads. It is best if you can use database building parameters that are more suited for classification of nanopore reads, you may be able to increase species-level classification.

as an example see here: https://github.com/DerrickWood/kraken2/issues/352

davidvilanova commented 3 years ago

Thanks @AGalanis97 , i have tried to build the new database as suggested in post #352

1 / a kmer-len of 26. Failed because minimizer length is too long

2/ kmer-len of 26 --minimizer-len 26 but also fails as follows:

"kraken2-build: number of minimizer spaces (7) exceeds max
for minimizer len (26); max: 6

3/ kmer-len of 26 --minimizer-len 26 and --minimizer-spaces 6 (or7). Got an empty database

image

jenniferlu717 commented 3 years ago

I'm not certain why your database is empty, but regarding your original post, yes, those 34 hits at the genus level do indicate that those reads have kmers that are shared between 2+ different Bacillus species.

I do not think reducing the kmer size will help in that case. I would assume that longer kmers would be more specific to species rather than shorter actually.