DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
735 stars 274 forks source link

kraken2-build: some files appear to be missing from ftp site #470

Open milobrooks opened 3 years ago

milobrooks commented 3 years ago

When I try to download the kraken2 viral database I get an error saying that certain files are missing from the source directory. I checked the source files, for example at ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/611/645/GCF_001611645.5_ASM161164v5/ Indeed there is no such file GCF_001611645.5_ASM161164v5_genomic.fna.gz. Does the manifest.txt file need updating? Below is the command I ran and the output. Please note --use-ftp option does not help in this case.

kraken2-build --download-library viral -db kraken2_db rsync: link_stat "/all/GCF/003/034/835/GCF_003034835.1_ASM303483v1/GCF_003034835.1_ASM303483v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/001/611/645/GCF_001611645.5_ASM161164v5/GCF_001611645.5_ASM161164v5_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/002/833/545/GCF_002833545.1_ASM283354v1/GCF_002833545.1_ASM283354v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/002/957/295/GCF_002957295.1_ASM295729v1/GCF_002957295.1_ASM295729v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/006/869/785/GCF_006869785.1_ASM686978v1/GCF_006869785.1_ASM686978v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/003/014/195/GCF_003014195.1_ASM301419v1/GCF_003014195.1_ASM301419v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) rsync: link_stat "/all/GCF/002/957/515/GCF_002957515.1_ASM295751v1/GCF_002957515.1_ASM295751v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)

ctuni commented 3 years ago

I am having the same issue, which is also posted here. In that issue someone came up with a temporal solution, which consists in modifying the rsync_from_ncbi.pl file. Before this can be fixed from the kraken2 tool end, it seems to be a good workaround. Hope it helps!

milobrooks commented 3 years ago

Thank you for the suggestion. As a workaround I downloaded the pre-built database from https://benlangmead.github.io/aws-indexes/k2. This seems to be doing the job. If I need a custom database, I'll try modifying the rsync_from_ncbi.pl file.