Closed ZhangDengwei closed 6 months ago
Only complete genomes are included by default. If you want to download non-complete genomes, you can try using the krakenuniq-download scripts: https://github.com/fbreitwieser/krakenuniq
However, yes, we only suggest using complete genomes due to contamination. The genome representation is sufficient for most sample sets. If something is in your sample that is not represented, you will likely get a large fraction of unclassified reads.
Hi,
I have downloaded and built the kraken2 database with
kraken2-build --download-taxonomy
, as followsTake the bacteria as an example, following files have been generated in the
bacteria
folderI have gone through some previous posts and reviewed the
kraken2
paper, and I found that only complete genomes would be downloaded this way. A total of 264,821 genomes are included in theassembly_summary.txt
, but only 29,967 are "Complete Genome". I understand that the draft genome might be contaminated as noticed in thekraken
paper, I wonder whether the only inclusion of the complete genome would influence the taxonomical annotation a lot as some bacteria are still uncultured, especially in the human gut genome.Besides, I checked the
manifest.txt
which contains 34,573 genomes. I wonder whether the downloaded genomes are those in themanifest.txt
. If so, I found some genomes are "Chromosome" instead of "Complete Genome", for instanceLastly, I attempted to count the genomes in
library.fna
, but one genome might contain multiple contigs, making me hard to count the genome number directly.