DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
686 stars 267 forks source link

Unable to download the database: kraken2-build --standard --db kraken2-standard-db/ --threads 42 #661

Closed hebamuh68 closed 1 year ago

hebamuh68 commented 1 year ago

This line kraken2-build --standard --db kraken2-standard-db/ --threads 42

Gives me this error

Downloading nucleotide gb accession to taxon map...rsync: getaddrinfo: ftp.ncbi.nlm.nih.gov 873: Temporary failure in name resolution
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.2.7]

image

Somebodyatthdoor commented 1 year ago

Hi hebamuh68,

I have also been getting this error for the past few days. It seems to not be a problem with kraken2 but with the ncbi website. When I have tried to download the files outside of kraken2 I have had the same issue. I have also tried it on multiple machines. Sometimes the command works for a short period of time, then it fails. Sometimes it just fails straight away. I think it might just be a case of waiting to see if ncbi fix the problem.

Cheers, Laura

hebamuh68 commented 1 year ago

@Somebodyatthdoor

I asked someone and there's alternative named 'Metaphlan', I'm trying to install it now

dandaman commented 1 year ago

The issue is related to the use of FTP in the 2 scripts download_genomic_library.sh and rsync_from_ncbi.pl. You can patch them using these diffs:

diff --git a/scripts/download_genomic_library.sh b/scripts/download_genomic_library.sh
index ffd96d2..39bd7c7 100755
--- a/scripts/download_genomic_library.sh
+++ b/scripts/download_genomic_library.sh
@@ -14,7 +14,7 @@ set -e  # Stop on error

 LIBRARY_DIR="$KRAKEN2_DB_NAME/library"
 NCBI_SERVER="ftp.ncbi.nlm.nih.gov"
-FTP_SERVER="ftp://$NCBI_SERVER"
+FTP_SERVER="https://$NCBI_SERVER"
 RSYNC_SERVER="rsync://$NCBI_SERVER"
 THIS_DIR=$PWD
diff --git a/scripts/rsync_from_ncbi.pl b/scripts/rsync_from_ncbi.pl
index 446efc9..d92a625 100755
--- a/scripts/rsync_from_ncbi.pl
+++ b/scripts/rsync_from_ncbi.pl
@@ -43,7 +43,7 @@ while (<>) {
   my $full_path = $ftp_path . "/" . basename($ftp_path) . $suffix;
   # strip off server/leading dir name to allow --files-from= to work w/ rsync
   # also allows filenames to just start with "all/", which is nice
-  if (! ($full_path =~ s#^ftp://${qm_server}${qm_server_path}/##)) {
+  if (! ($full_path =~ s#^https://${qm_server}${qm_server_path}/##)) {
     die "$PROG: unexpected FTP path (new server?) for $ftp_path\n";
   }
   $manifest{$full_path} = $taxid;

See: #653

Somebodyatthdoor commented 1 year ago

Hi,

Unfortunately this isn't the solution for me, as the version of kraken2 I have downloaded already has these changes implemented. The same problem happens trying to download the databases using wget, which made me think it may be a problem on ncbi's end. However, when I contacted ncbi they said that they had had no complaints about the problem from anyone else, and that it was likely to be a firewall problem on my end. My IT department disagrees, as they replicated my problem on several different machines. For the moment, like @hebamuh68, I have also switched to using metaphlan, though I much prefer the functionality of kraken2.

Thanks for the suggestions, Laura

hebamuh68 commented 1 year ago

@Somebodyatthdoor

I find API called tool chest can run kraken2 on the cloud and it works perfectly, try it. good luck.