Open charlesfoster opened 4 years ago
Its likely because you're using the minikraken database.
In order for the database to be condensed into 8GB, the minikraken database subsamples all the kmers from the full database. So kmers are missing in the minikraken database.
It is possible that a full database will cause more reads being classified.
Hi,
I'm trying out using
kraken2
for the first time. I have short (150 bp) sequencing reads that are ostensibly of SARS-CoV-2, but as part of QC I wish to classify them usingkraken2
to see if there are any contaminants. I runkraken2
with the minikraken2 database like so:kraken2 --threads 8 --gzip-compressed fastq/READS1.fastq.gz fastq/READS2.fastq.gz --db /path/to/minikraken2_v2_8GB_201904_UPDATE --output test.kraken --report test_kraken.report --paired --use-names
The output is:
The classified sequences appear to check out. For example, I'll blast reads assigned "Homo sapiens (taxid 9606)" against the nt database and they'll get human hits; likewise for reads assigned to viruses/SARS-CoV-2. However, the unclassified sequences (81.22% of them) do not equally make sense. I'll randomly select unclassified reads and every time there are perfect matches to SARS-CoV-2 genomes in the nt database.
Since I'm new to
kraken2
, it's possible I'm missing something here, but shouldn't these unclassified reads be classified as SARS-CoV-2, or at least to some viral lineage? Are there some settings I'm missing?Thanks, Charles