DaehwanKimLab / centrifuge

Classifier for metagenomic sequences
GNU General Public License v3.0
237 stars 73 forks source link

centrifuge-downlaod fails for vertebrates #177

Open theo-allnutt-bioinformatics opened 4 years ago

theo-allnutt-bioinformatics commented 4 years ago

centrifuge-download -o library -P 24 -d "vertebrate_mammalian","vertebrate_other" refseq >> refseq_seqid2taxid.map Directory library/vertebrate_mammalian exists. Continuing Downloading ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/assembly_summary.txt ... Domain vertebrate_mammalian has no genomes with specified filter.

Same result with 'genbank'

mourisl commented 4 years ago

Can you check whether the assembly_summary.txt file has been fully downloaded?

theo-allnutt-bioinformatics commented 4 years ago

Yes it has. It is not very long for mammals.

theo-allnutt-bioinformatics commented 4 years ago

I cannot get this command to work with and vertebrate genomes, but I can't see why - the summary files look the same. I've had a look at the code.. is there a way to just remove this 'filter'?

Thanks.

theo-allnutt-bioinformatics commented 4 years ago

Also if you use 'plant' it only returns three genomes. There seems to be a problem with the filter.

theo-allnutt-bioinformatics commented 4 years ago

Any solution to this??

sklasek commented 3 years ago

Hi, I believe I have the same problem when trying to download bacterial, archaeal, and viral genomes. Running centrifuge v.1.0.4-beta:

centrifuge-download -o library -P 12 -m -d 'archaea,bacterial,viral' refseq > seqid2taxid.map

Downloading ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/assembly_summary.txt ... Download failed! Have a look at valid domains at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq .

These are indeed the names of three valid domains, and I noted the manual specifies that they should be comma-separated. I noticed that library/archaea/assembly_summary.txt downloaded, but no such file for bacteria or viral. Any suggestions? Thanks, Scott

Christoph-Ammer commented 3 years ago

Hi,

I faced the same problem with the download of some rodents genomes. I could solve this problem turning -a to "Any" and searching in the genbank. Here my command centrifuge-download -o library -d "vertebrate_mammalian" -a "Any" -t 54292,447135,47230,29092,39030 genbank >> seqid2taxid.map Hope that helps.

Best Christoph

GastonViarengo commented 3 years ago

Hello,

I'm dealing with a similar problem than Scott. I'm running centrifuge version 1.0.4 in Ubuntu 20.04.2 LTS. I have already downloaded archaea, viral and bacteria RefSeq genomes (these later ones with several interrumptions, errors and restarts -luckily previously downloaded bacteria genomes were recognized and restarted from there-) and succesfully created "seqid2taxid.map" file (but with repetitions due to the restarts mentioned).

Now I want a separate "seqid2taxid_bact.map" but when running centrifuge-download -o library_2021 -m -d "bacteria" refseq > seqid2taxid_bact.map I get the following error:

Directory library_2021/bacteria exists. Continuing Downloading ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt ... Download failed! Have a look at valid domains at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq . and "assembly_summary.txt" file is incomplete. I have this file complete but everytime centrifuge-download starts overwrites it.

Is there a way to override this step so providing "assembly_summary.txt" and all downloaded genomes makes centrifuge-download go directly to creating the "seqid2taxid_bact.map" file?

Thanks for any help! Bests,

Gaston

Guliba commented 1 year ago

I GUESS the problem may go with ncbi-blast toolkits package "dustmasker". As I remove the ncbi-blast toolkits path from enviroment path(or delete softerware), the download process will be continue ,but mention “line28: dustmasker: command not found ". And the downloaded genomes looks well.

centrifuge-download -P 24 -o library -m -d "bacteria,viral,archaea,fungi,protozoa,human" refseq > seqid2taxid.map centrifuge-class version 1.0.4