Unclassified reads have blast hits that match lineages in the kraken2 database

Hi,

I'm trying out using kraken2 for the first time. I have short (150 bp) sequencing reads that are ostensibly of SARS-CoV-2, but as part of QC I wish to classify them using kraken2 to see if there are any contaminants. I run kraken2 with the minikraken2 database like so:

kraken2 --threads 8 --gzip-compressed fastq/READS1.fastq.gz fastq/READS2.fastq.gz --db /path/to/minikraken2_v2_8GB_201904_UPDATE --output test.kraken --report test_kraken.report --paired --use-names

The output is:

436895 sequences (131.94 Mbp) processed in 7.533s (3479.9 Kseq/m, 1050.92 Mbp/m). 82033 sequences classified (18.78%) 354862 sequences unclassified (81.22%)

The classified sequences appear to check out. For example, I'll blast reads assigned "Homo sapiens (taxid 9606)" against the nt database and they'll get human hits; likewise for reads assigned to viruses/SARS-CoV-2. However, the unclassified sequences (81.22% of them) do not equally make sense. I'll randomly select unclassified reads and every time there are perfect matches to SARS-CoV-2 genomes in the nt database.

Since I'm new to kraken2, it's possible I'm missing something here, but shouldn't these unclassified reads be classified as SARS-CoV-2, or at least to some viral lineage? Are there some settings I'm missing?

Thanks, Charles

DerrickWood / kraken2

Unclassified reads have blast hits that match lineages in the kraken2 database #343