fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
224 stars 44 forks source link

krakenuniq-download content-length mismatch #113

Open AJTDaedalus opened 2 years ago

AJTDaedalus commented 2 years ago

When running krakenuniq-download I get the error below. I've also included my arguments for context. It seems like I'm getting more from NCBI than the program expects.

krakenuniq-download --db . --taxa "fungi" --dust microbial-nt

Storing taxonomy timestamp [date > ./taxonomy/timestamp] ... done (took 0s)
Extracting nodes file [tar -C ./taxonomy -zxvf ./taxonomy/taxdump.tar.gz nodes.dmp 1>&2] ...nodes.dmp
 done (took 4s)
./taxonomy/nodes.dmp                               check [159.89 MB]
Extracting names file [tar -C ./taxonomy -zxvf ./taxonomy/taxdump.tar.gz names.dmp 1>&2] ...names.dmp
 done (took 4s)
./taxonomy/names.dmp                               check [211.34 MB]
: downloading ...Content-length mismatch: expected 216968193080 bytes, got 216997113648
wdlingit commented 1 year ago

I got a similar error message

$ krakenuniq-download --db SOMEWHERE/db/ --dust microbial-nt
Storing taxonomy timestamp [date > SOMEWHERE ] ... done (took 0s)
Extracting nodes file [tar -C SOMWHERE/db//taxonomy -zxvf SOMEWHERE/db//taxonomy/taxdump.tar.gz nodes.dmp 1>&2] ...nodes.dmp
 done (took 6s)
SOMEWHERE/db//taxonomy/nodes.dmp check [163.80 MB]
Extracting names file [tar -C SOMEWHERE/db//taxonomy -zxvf SOMEWHERE/db//taxonomy/taxdump.tar.gz names.dmp 1>&2] ...names.dmp
 done (took 4s)
SOMEWHERE/db//taxonomy/names.dmp check [219.76 MB]
: downloading ...Content-length mismatch: expected 279998465077 bytes, got 280000205317

Redid with --verbose option, the download URL is ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz, the size is indeed 279998465077. My colleagues and I tried this a few times and got the same message (but the "got" sizes slight different).

wdlingit commented 1 year ago

In my case, I think the problem was due to downloading the ftp* URL by using the perl module. The following change downloaded correct size and get into GUNZIPPING step.

$ diff krakenuniq-download krakenuniq-download.bck
344c344
<   if ($url =~ /^http/ || $url =~ /^ftp/) {
---
>   if ($url =~ /^http/) {
$ krakenuniq-download --verbose --db SOMEWHERE/db --dust microbial-nt
Fetching ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz to SOMEWHERE/db/taxonomy/taxdump.tar.gz ...  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 57.9M  100 57.9M    0     0  6476k      0  0:00:09  0:00:09 --:--:-- 11.2M
 SUCCESS
Storing taxonomy timestamp [date > SOMEWHERE/db/taxonomy/timestamp] ... done (took 0s)
Extracting nodes file [tar -C SOMEWHERE/db/taxonomy -zxvf SOMEWHERE/db/taxonomy/taxdump.tar.gz nodes.dmp 1>&2] ...nodes.dmp
 done (took 3s)
SOMEWHERE/db/taxonomy/nodes.dmp check [163.81 MB]
Extracting names file [tar -C SOMEWHERE/db/taxonomy -zxvf SOMEWHERE/db/taxonomy/taxdump.tar.gz names.dmp 1>&2] ...names.dmp
 done (took 4s)
SOMEWHERE/db/taxonomy/names.dmp check [219.79 MB]
Fetching ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz to SOMEWHERE/db/nt.fna.gz ...  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  260G  100  260G    0     0  13.7M      0  5:24:08  5:24:08 --:--:-- 13.6M
 GUNZIPPING