DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
719 stars 270 forks source link

kraken2-build with --use-ftp stopped quickly after running #136

Open mkazanov opened 5 years ago

mkazanov commented 5 years ago

This command:

$ kraken2-build --standard --db standarddb --use-ftp Downloading taxonomy tree data...$

stopped several seconds after start. Nothing is downloading.

seahurt commented 5 years ago

+1

shubavarshini commented 5 years ago

Hi, I've the same issue. My command is: kraken2-build --download-library bacteria --db krakenDB --threads 20 --use-ftp I go back to the log files and I get this: Step 1/2: Performing ftp file transfer of requested files Timeout at /home/anaconda3/lib/5.26.2/Net/FTP.pm line 583.

I know its a perl module issue. Let me know if it can be fixed on the command line.

Krasnopeev commented 5 years ago

I've the same issue. After $kraken2-build --standard --threads 4 --db STDB --use-ftp

console returns:

Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: FTP connection error: Network is unreachable
jprokos1 commented 5 years ago

I am having the same issue when using the --standard flag either with or without --use-ftp. After running kraken2-build --standard --threads 24 --db $DBNAME Kraken outputs this,

Downloading taxonomy tree data... done. Untarring taxonomy tree data... done. Step 1/2: Performing rsync file transfer of requested files Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences All files processed, cleaning up extra sequence files... done, library complete. Masking low-complexity regions of downloaded library... done. Step 1/2: Performing rsync file transfer of requested files

And gets stuck for over 12 hours at this step.

When run with: kraken2-build --standard --threads 24 --db $DBNAME --use-ftp It throws:

Step 1/2: Performing ftp file transfer of requested files Unable to close datastream at kraken/install_dir/rsync_from_ncbi.pl line 99. rsync_from_ncbi.pl: unable to download all/GCF/005/406/325/GCF_005406325.1_ASM540632v1/GCF_005406325.1_ASM540632v1_genomic.fna.gz: Opening BINARY mode data connection for all/GCF/005/406/325/GCF_005406325.1_ASM540632v1/GCF_005406325.1_ASM540632v1_genomic.fna.gz (1142411 bytes) rsync_from_ncbi.pl: unable to download all/GCF/001/011/115/GCF_001011115.1_ASM101111v1/GCF_001011115.1_ASM101111v1_genomic.fna.gz: Connection closed ... gzip: all/GCF_002156705.1_ASM215670v1_genomic.fna.gz: No such file or directory ...

Is this a problem with the --standard flag or a connection issue between my server and ncbi? I am able to use wget to download genomic data from ncbi through the same server, so I don't believe there is any proxy or firewall blocking the connection.

nick-youngblut commented 5 years ago

I'm getting the following with kraken2 (2.0.8_beta):

$ kraken2-build --use-ftp --download-taxonomy --db $OUTDIR
Uncompressing taxonomy data...
gzip: nucl_wgs.accession2taxid.gz: unexpected end of file
lucaz88 commented 4 years ago

I've tried so many different things (including checking that the connection to NCBI FTP address and port were open) and at the end, I manage to download the standard databases just by adding the flag "--threads" to the command. Essentially I just run straight away:

kraken2-build --standard --use-ftp --db /DATABASES/kraken --threads 60

alirizaaribas-ibg commented 3 years ago

I tried many things but fails: kraken2-build --standard --threads 16 --db /archive/db/kraken2db/maindb kraken2-build --standard --threads 16 --db /archive/db/kraken2db/maindb --use-ftp kraken2-build --standard --threads 16 --db /archive/db/kraken2db/maindb --use-ftp --no-masking

Here are start and ending parts of the errors:

Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: unable to download all/GCF/000/220/645/GCF_000220645.1_ASM22064v1/GCF_000220645.1_ASM22064v1_genomic.fna.gz: 
rsync_from_ncbi.pl: unable to download all/GCF/000/091/665/GCF_000091665.1_ASM9166v1/GCF_000091665.1_ASM9166v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/299/365/GCF_000299365.1_ASM29936v1/GCF_000299365.1_ASM29936v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/016/385/GCF_000016385.1_ASM1638v1/GCF_000016385.1_ASM1638v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/002/214/525/GCF_002214525.1_ASM221452v1/GCF_002214525.1_ASM221452v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/006/805/GCF_000006805.1_ASM680v1/GCF_000006805.1_ASM680v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/328/665/GCF_000328665.1_ASM32866v1/GCF_000328665.1_ASM32866v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/009/690/625/GCF_009690625.1_ASM969062v1/GCF_009690625.1_ASM969062v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/196/655/GCF_000196655.1_ASM19665v1/GCF_000196655.1_ASM19665v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/010/706/455/GCF_010706455.1_ASM1070645v1/GCF_010706455.1_ASM1070645v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/007/185/GCF_000007185.1_ASM718v1/GCF_000007185.1_ASM718v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/015/825/GCF_000015825.1_ASM1582v1/GCF_000015825.1_ASM1582v1_genomic.fna.gz: Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/000/306/765/GCF_000306765.2_ASM30676v2/GCF_000306765.2_ASM30676v2_genomic.fna.gz: Connection closed
Processed 10628/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_000910615.1_ViralProj213076_genomic.fna.gz: No such file or directory                     
Processed 10629/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_001440935.1_ViralProj301250_genomic.fna.gz: No such file or directory                    
Processed 10630/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_002593825.1_ASM259382v1_genomic.fna.gz: No such file or directory                   
Processed 10631/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_001967175.1_ViralProj362172_genomic.fna.gz: No such file or directory                   
Processed 10632/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_000896675.1_ViralProj81151_genomic.fna.gz: No such file or directory               
Processed 10633/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_003181275.1_ASM318127v1_genomic.fna.gz: No such file or directory
Processed 10634/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_000853525.5_ViralMultiSegProj14918_genomic.fna.gz: No such file or directory
Processed 10635/10636 projects (116 sequences, 5.07 Mbp)...gzip: all/GCF_000853485.1_ViralProj14909_genomic.fna.gz: No such file or directory                   
Processed 10636 projects (116 sequences, 5.07 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Downloading plasmid files from FTP... done.
Masking low-complexity regions of downloaded library...

What should I check? file permissions, TCP ports, firewall, disk space, perl, dustmasker... etc. I checked many things.

qducarmon commented 3 years ago

I think it has to do with the URL not being correct anymore, the 'all' should be 'all_assembly_versions' (see below). It's not so clear how to fix this (at least not for a conda installation). https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Acerihabitans_arboris/all_assembly_versions/GCF_010131535.1_ASM1013153v1/