DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
687 stars 266 forks source link

problem with downloading databases #775

Open AlexandreThibodeauUdM opened 7 months ago

AlexandreThibodeauUdM commented 7 months ago

Hello all, just to mention that

downloading database for bacteria only do not work at the moment

"Step 1/2: Performing rsync file transfer of requested files rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2) "

Downloading rdp 16S do not work also: went to RDP web site and it is not working. It is not also listed in google, has it closed?

Downloading archea works

Downloading silva 16s works also.

tdfy commented 7 months ago

experiencing same problem here, failed under 'standard' build.

Step 1/2: Performing rsync file transfer of requested files
rsync: link_stat "/all/GCF/030/643/825/GCF_030643825.1_ASM3064382v1/GCF_030643825.1_ASM3064382v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/030/866/925/GCF_030866925.1_ASM3086692v1/GCF_030866925.1_ASM3086692v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [generator=3.2.3]
rsync_from_ncbi.pl: rsync error, exiting: 5888
AlexandreThibodeauUdM commented 7 months ago

RDP web do not exist anymore, therefore it is impossible to use it to fetch special database RDP for classifying 16S sequences.

AlexandreThibodeauUdM commented 7 months ago

Downloaded archea using and added Refseq bacteria (17 000 genomes, 21 Go compressed file)), manually, from NCBI new tool: NCBI Datasets (https://www.ncbi.nlm.nih.gov/datasets/).

Needs 121 Go of free ram to build the database, only have 59 free on my computer, so I am reducing it to 55 go using: kraken2-build --build --threads 8 --db ./database --max-db-size 55000000000

AlexandreThibodeauUdM commented 7 months ago

Database did build itself, took 1 hour but apparently did not use my bacteria genomes, only the archea. So I beleive it did not fin the fna. Maybe because the architecture of the folder, once it is decompressed from NCBI, is not correct?

tdfy commented 7 months ago

I've downloaded and unzipped the 16/8 std dbs found below. Temporary solution.

https://benlangmead.github.io/aws-indexes/k2

MixalisSn commented 7 months ago

I am also experiencing the same issue, as the following fail to synchronize.

rsync: link_stat "/all/GCF/000/012/405/GCF_000012405.1_ASM1240v1/GCF_000012405.1_ASM1240v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync: link_stat "/all/GCF/033/372/575/GCF_033372575.1_ASM3337257v1/GCF_033372575.1_ASM3337257v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)

As a result, the database (probably) does not build successfully, and when I attempt to run kraken2, I get the following error: kraken2: database ("database") does not contain necessary file taxo.k2d

tdfy commented 7 months ago

Perhaps NCBI has updated their repository (?), I was able to proceed w/o rsync errors today- bacteria genomes.

kraken2-build --download-library bacteria
maxmaronna commented 7 months ago

The plasmid DB is not working: Kraken2 is using FTP mode even when you´re not requesting that option:

kraken2-build --download-library plasmid --no-masking --threads 8 --db contaminant_kraken2

jenniferlu717 commented 6 months ago

@MixalisSn you need to run kraken2-build --download-taxonomy --db MYDB first

jenniferlu717 commented 6 months ago

@AlexandreThibodeauUdM RDP is no longer being supported unfortunately.

For bacteria, this error results when NCBI is in the middle of updating their database files and the assembly_summary.txt has not been updated yet. It should work fine after a couple days.

@maxmaronna the plasmid download is different from the Refseq downloads. I'll check on the issue.

MixalisSn commented 6 months ago

@jenniferlu717 @tdfy Indeed, after some days, the database was downloaded successfully. Thank you very much for your support and replies.