DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
720 stars 273 forks source link

Viral Refseq download #579

Open taniagmangolini opened 2 years ago

taniagmangolini commented 2 years ago

I have downloaded the viral library using the following command: kraken2-build --download-library viral --db $database. However, I have noticed some missing accessions in the downloaded library, such as NC_003977.2 (taxid 10407). So, I decided to download the refseq fastas directly from the NCBI FTP (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral*.genomic.fna.gz) to compare the results, and the absent accession was found there. So my doubt is if the tool "kraken2-build --download-library" doesn't work with the latest viral refseq version and why this divergence is happening.

jenniferlu717 commented 2 years ago

The download-library command only downloads Refseq genomes that are complete, not using any draft genomes. I would assume that this is the reason why this accession was skipped.