Closed kwnamhang closed 4 years ago
Hi there! Thanks :)
And sorry you’re having trouble! It won’t matter what your extension is (csv) so long as the file is still a single column of accessions. And the way you’re trying to run it is spot on 👍
It seems the problem you’re having right now is that the initial download of the reference tables isn’t working (we need those to build the links to get the genomes). It might be that your system doesn’t allow ftp transfer, which is currently how it’s trying to do it (and then would also be using that for each genome). I’ve added an option for using http instead for cases like this in another one of my programs, so I’d be happy to add that here too if it will help :)
Can you try this command and see if it works:
curl ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt -o ncbi_RS_assembly_info.tmp
If it hangs for a while (like 20-30 seconds) and isn’t doing anything, cancel it with ctrl + c
.
And then try this one and see if it works?
curl https://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt -o ncbi_RS_assembly_info.tmp
And let me know if that one successfully downloads the file :)
Thanks for your prompt reply!
I can confirm that the first command you gave works - downloading via ftp.
Also, I've now re-run my original command "bit-dl-ncbi-assemblies -w list.csv -f fasta -j 10" and it's now downloading fine via ftp!
As you say, it may be that my work network doesn't allow ftp transfer (based within a hospital laboratory). Seems to work fine over my home network.
I'll try the http-based command when I'm next at work. Ideally, would be great to be able to use your tool at work also.
Thanks again for your help and making this tool :+1:
Oh great :)
I will add the capability to be able to do it through http tomorrow and let you know when it’s in, either way the option will be helpful to have, and hopefully it will work on your work network🤞
Hmmm... perhaps I spoke too soon... I thought it began downloading but when it finished, says none of the accessions were found? I've manually checked the accessions, and they are definitely searchable on NCBI. Not sure what's the issue. Sorry to bug you!
bit-dl-ncbi-assemblies -w list.txt -f fasta -j 10
Targeting 157 genomes in fasta format.
Downloading ncbi assembly summaries to be able to construct ftp links...
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 238M 100 238M 0 0 216k 0 0:18:47 0:18:47 --:--:-- 297k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 60.2M 100 60.2M 0 0 200k 0 0:05:07 0:05:07 --:--:-- 154k
******************************* NOTICE *******************************
157 input accessions were not found at NCBI.
Written to "NCBI-accessions-not-found.txt".
**********************************************************************
Remaining total targets: 0
Sorry, I'm an idiot , please disregard my post above. Just realized that your tool searches only the Assembly database of NCBI. My accessions are for the Nucleotide database. Would you think it's possible easy for me to modify your code to search accessions against the Nucleotide db?
Thanks so much for your help!
Oops, just saw your follow up.
No unfortunately this won’t be the tool for that :/
But NCBI’s e-direct tool can likely get the job done after some figuring. It’s super powerful, but not very user-friendly. I have a page on my site of some examples from things I’ve figured out before here: https://astrobiomike.github.io/unix/ncbi_eutils
That can hopefully help get you started :)
Oh and I forgot about this tool that fortunately I noted at the top of that page I just linked to, maybe this can grab what you’re looking for :)
Hi Mike,
Thanks for making this tool - exactly what I'm looking for.
I'm trying to batch download fasta files from a list of Genbank Accessions that I've made as .csv file.
However, when I try to run the command, I get the following:
~/Downloads$ bit-dl-ncbi-assemblies -w list.csv -f fasta -j 10
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 1 seconds. 10 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 2 seconds. 9 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:29 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 4 seconds. 8 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 8 seconds. 7 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 16 seconds. 6 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 32 seconds. 5 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 64 seconds. 4 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:29 --:--:-- 0 curl: (28) Connection timed out after 30000 milliseconds Warning: Transient problem: timeout Will retry in 128 seconds. 3 retries left. 0 0 0 0 0 0 0 0 --:--:-- 0:00:30 --:--:-- 0 curl: (28) Connection timed out after 30001 milliseconds Warning: Transient problem: timeout Will retry in 256 seconds. 2 retries left.
Is there an easy fix to this? Is it because I'm requesting from .csv rather than .txt file? Any other parameters I need to set?
Many thanks for your help!