DiltheyLab / MetaMaps

Long-read metagenomic analysis
Other
98 stars 23 forks source link

downloadRefseq.pl timeout #18

Open schorlton opened 4 years ago

schorlton commented 4 years ago

Running from the latest commit (881373c). Tried downloading the database. Unfortunately timing out as per below.


perl ~/Programs/MetaMaps/downloadRefSeq.pl --seqencesOutDirectory download/refseq --taxonomyOutDirectory download/taxonomy --targetBranches archaea --skipIncompleteGenomes 1

citations.dmp
delnodes.dmp
division.dmp
gencode.dmp
merged.dmp
names.dmp
nodes.dmp
gc.prt
readme.txt

Taxonomy downloaded and extracted into download/taxonomy

Now download genomes for 299 archaea species (328 genomes - refseq - skip incomplete genomes: 1).
         Genome 4 / 328 ; species 4 / 299 archaea (Candidatus_Nitrosocaldus_cavascurensis) -- version 1 / 1: GET GCF_900248165.1_Nitrosocaldus_cavascurensis_SCU2_chromosome_assembly_report.txt                    
         Genome 50 / 328 ; species 49 / 299 archaea (Methanosarcina_vacuolata_Z_761) -- version 1 / 1: GET GCF_000969905.1_ASM96990v1_protein.faa.gz

Timeout at /home/sam/.conda/envs/metamaps/lib/5.26.2/Net/FTP.pm line 583.```
froggleston commented 4 years ago

I also see this issue when downloading a large amount of data, and the script doesn't let you resume, so it's downloading right from the start again.

AlexanderDilthey commented 4 years ago

I have added a --timeout parameter. Increasing this will hopefully fix the observed issue.

Medium-term, it would be good to add a --resume parameter as well, or an automatic check whether an assembly has been downloaded already. Doing this should not be too complicated (checking whether the files in @genomic_fna_files and @assembly_report_files are present already).

froggleston commented 4 years ago

Hi Alexander,

Good timing as I've just been playing with some code to resume. I'll make a pull request when I've got something tested.

Cheers!

Rob


From: Alexander Dilthey notifications@github.com Sent: Thursday, November 14, 2019 3:09:05 PM To: DiltheyLab/MetaMaps MetaMaps@noreply.github.com Cc: robert davey (EI) robert.davey@earlham.ac.uk; Comment comment@noreply.github.com Subject: Re: [DiltheyLab/MetaMaps] downloadRefseq.pl timeout (#18)

I have added a --timeout parameter. Increasing this will hopefully fix the observed issue.

Medium-term, it would be good to add a --resume parameter as well, or an automatic check whether an assembly has been downloaded already. Doing this should not be too complicated (checking whether the files in @genomic_fna_files and @assembly_report_files are present already).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/DiltheyLab/MetaMaps/issues/18?email_source=notifications&email_token=AAOJDLXQ3NFTNFVBM54RGVLQTVSZDA5CNFSM4I5V7KQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECEUYQ#issuecomment-553929314, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOJDLXQND74YNE7PPCSVNLQTVSZDANCNFSM4I5V7KQA.

AlexanderDilthey commented 4 years ago

Hi Rob,

Wonderful! Thank you! Looking forward to the pull request! :-)

Cheers,

Alex

tanushrin commented 3 years ago

Hi,

I am having timeout issue. Tried increasing the --timeout several times but it still doesn't work. @AlexanderDilthey, would appreciate if you could suggest how to get this resolved.

perl $METAMAPS_SRC_DIR/downloadRefSeq.pl --targetBranches archaea,bacteria,fungi,plant,protozoa --skipIncompleteGenomes 1 --seqencesOutDirectory $METAMAPS_DB_PATH/download/refseq --taxonomyOutDirectory $METAMAPS_DB_PATH/download/taxonomy --timeout 720

Few errors encountered: 1) Timeout at /util/common/bioconda/20200706/anaconda-py37/lib/5.26.2/Net/FTP.pm line 583 2) Net::FTP=GLOB(0x5598d583cf60)<<< 421 Idle timeout (60 seconds): closing control connection

Thank you. -Tanushri