fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
222 stars 44 forks source link

Error fetching genomes #48

Open JCSzamosi opened 5 years ago

JCSzamosi commented 5 years ago

When I'm using krakenuniq-download to download refseq genomes, every so often it will raise an error "Error fetching [ftp link]. Is curl installed?"

curl is definitely installed. That can't be the problem, since it is succeeding in fetching the vast majority of the genomes it tries to fetch. The ftp link it reports the error about always works when I try it in the browser, and it's not always the same number of links causing problems on subsequent attempts. Any idea how to troubleshoot this?

Thanks

asrivathsan commented 4 years ago

Hi just to check if there was any solution to this?

We are having the same issue when we use the command

krakenuniq-download --db DBDIR --threads 10 --dust refseq/bacteria refseq/archaea

It started failing after 7389/15418 and continues to give the abovementioned error.

Other krakenuniq-download commands went well.

thanks for looking into this

christopher047 commented 4 years ago

Same here, anyone solve this issue? Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/664/025/GCF_009664025.1_ASM966402v1/GCF_009664025.1_ASM966402v1_genomic.fna.gz. Is curl installed?

joshua-theisen commented 4 years ago

I get the same error:

krakenuniq-download --threads 8 --dust --db bactarch.template refseq/bacteria refseq/archaea
Downloading assembly summary file for bacteria genomes, and filtering to assembly level Complete_Genome.
 Downloading bacteria genomes:  5272/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/680/025/GCF_001680025.1_ASM168002v1/GCF_001680025.1_ASM168002v1_genomic.fna.gz. Is curl installed?
 Downloading bacteria genomes:  8496/16639 ... Error fetching ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/168/635/GCF_000168635.2_ASM16863v2/GCF_000168635.2_ASM16863v2_genomic.fna.gz. Is curl installed?
 Downloading bacteria genomes:  10759/16639 ...

The first 5k geomes download, then intermittently the Is curl installed? error occurs.

CuypersBart commented 2 years ago

I have the same issue. Does anyone know the cause and/or fix?

salzberg commented 2 years ago

This is caused by NCBI changing their ftp site setup, which they do frequently and which we can't control. However we are now putting out KrakenUniq/Kraken1 indices for download on Ben Langmead's index page here: https://benlangmead.github.io/aws-indexes/k2 We just put the "standard" database there, which will include files needed for Kraken 1, KrakenUniq, and Bracken, and we're going to put a larger database there too, which will add 100s of eukaryotic pathogens from EuPathDB. The standard database includes all RefSeq bacteria, archaea, viruses, and human.

CuypersBart commented 2 years ago

Thank you. That would be great!

I solved the problem for now by manually removing the genomes that throw an error, and using the --rsync flag on krakenuniq-download. After a few iterations, all genomes were downloaded correctly.

amizeranschi commented 1 year ago

@salzberg I noticed that, as of now, the Kraken2 databases at https://benlangmead.github.io/aws-indexes/k2 have been periodically updated, with the most recent one being from March 2023, while the latest version for KrakenUniq is from June 2022. I realize that the KrakenUniq databases are much larger and more difficult to create, but are there any plans for uploading an updated version of this as well?

salzberg commented 1 year ago

Yes, we do plan to update them, but they are huge, so this won't happen as often. We have several smaller ones, more specialized, so we might try to add those. Btw I don't have much funding for this, but I keep it going as best I can.