Produces mixed results. Sometimes it produces an error, but not always. I needed to relaunch the exact same command three times before it would complete successfully. I believe it has a stochastic behavior because of the amount of HTTP connections it makes. A small fraction of the connections may fail due to proxies or network congestion, and the script doesn't wrap them in a retry. This is the error message:
(krkn) user@cluster test $ krakenuniq-download --db DBDIR refseq/viral/Any viral-neighbors
Environment contains multiple differing definitions for 'cluster'.
Using value from 'CLUSTER' (xxxx) and ignoring 'cluster' (xxxx) at ~/miniconda3/envs/krkn/lib/perl5/site_perl/LWP/UserAgent.pm line 1134.
Environment contains multiple differing definitions for 'site'.
Using value from 'SITE' (xxxx) and ignoring 'site' (xxxx) at ~/miniconda3/envs/krkn/lib/perl5/site_perl/LWP/UserAgent.pm line 1134.
Downloading assembly summary file for viral genomes, and filtering to assembly level Any.
Downloading viral genomes: 12254/14992 ... Error fetching https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/856/685/GCF_000856685.1_ViralProj15059/GCF_000856685.1_ViralProj15059_genomic.fna.gz. Is curl installed?
Downloading viral genomes: 14992/14992 ... Found 14992 files.
Downloading viral neighbors.
Downloading DBDIR/taxonomy/nucl_gb.accession2taxid.gz [curl -g 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz' -o 'DBDIR/taxonomy/nucl_gb.accession2taxid.gz'] ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2301M 100 2301M 0 0 48.5M 0 0:00:47 0:00:47 --:--:-- 49.0M
done (48s)
DBDIR/taxonomy/nucl_gb.accession2taxid.gz check [2.25 GB]
SUCCESS
Sorting maping file (will take some time) [gunzip -c DBDIR/taxonomy/nucl_gb.accession2taxid.gz | cut -f 1,3 > DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp && sort --parallel 5 -T DBDIR/taxonomy DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp > DBDIR/taxonomy/nucl_gb.accession2taxid.sorted && rm DBDIR/taxonomy/nucl_gb.accession2taxid.sorted.tmp] ... done (4m54s)
DBDIR/taxonomy/nucl_gb.accession2taxid.sorted check [4.81 GB]
Reading names file ...
Downloading DBDIR/taxonomy/taxdump.tar.gz [curl -g 'https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz' -o 'DBDIR/taxonomy/taxdump.tar.gz'] ...
Download taxdump.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 62.2M 100 62.2M 0 0 10.6M 0 0:00:05 0:00:05 --:--:-- 13.4M
done (6s)
DBDIR/taxonomy/taxdump.tar.gz check [62.24 MB]
SUCCESS
Storing taxonomy timestamp [date > DBDIR/taxonomy/timestamp] ... done (0s)
Extracting nodes file [tar -C DBDIR/taxonomy -zxvf DBDIR/taxonomy/taxdump.tar.gz nodes.dmp > /dev/null] ... done (2s)
DBDIR/taxonomy/nodes.dmp check [186.48 MB]
Extracting names file [tar -C DBDIR/taxonomy -zxvf DBDIR/taxonomy/taxdump.tar.gz names.dmp > /dev/null] ... done (3s)
DBDIR/taxonomy/names.dmp check [234.57 MB]
DBDIR/library/viral/Neighbors/esearch_res.jsonDownloading 188670 sequences into DBDIR/library/viral/Neighbors.
query_key=1&webenv=MCID_665f1c6a8d232052172de20c
Downloading sequences 1 to 10000 of 188670 ... done
Downloading sequences 10001 to 20000 of 188670 ... done
Downloading sequences 20001 to 30000 of 188670 ...https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=taxonomy&id=AC_000192
Error fetching https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=taxonomy&id=AC_000192. Is curl installed?
(krkn) user@cluster test $
Running the following command from the manual:
Produces mixed results. Sometimes it produces an error, but not always. I needed to relaunch the exact same command three times before it would complete successfully. I believe it has a stochastic behavior because of the amount of HTTP connections it makes. A small fraction of the connections may fail due to proxies or network congestion, and the script doesn't wrap them in a
retry
. This is the error message: