DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
687 stars 267 forks source link

Test results did not meet expectations #849

Open hedy-ella opened 3 days ago

hedy-ella commented 3 days ago

Hi, I'm looking for a good sequence comparison software. When I tested the default parameters of kraken2, I used the sequences of more than 17,000 viruses (from the complete genome in the ncbi virus library and belonging to the refseq genome) and more than 100 sequences of Buchnera aphidicola (from the complete genome of NCBI taxid:9) as a test database. Next, I randomly selected 100 genomes from the 100+ Buchnera aphidicola, and then randomly took 50bp sequences from the genomes as query data. The problem is, I only identified 44 sequences classified (44.00%) using kraken2, and in the output file 40 sequences have a kmer ratio of 0:16, and there are also sequences that appear to have 16 kmer already classified, but the final classification result is 0=unclassified. By the way, I put the unclassifiable sequences in the total fasta file by less and could see exact matches (perhaps excluding wrongly extracted sequences), also I made an attempt with blast, which would be slightly better, matching 80 sequences (there were multiple matches results, all Buchnera aphidicola), but still lower than the expected. I am confused and not sure which step I have gone wrong, thank you for any help and reply!

1719994160857