Building database and adding new genomes.

parul-sharma commented 5 years ago

Hi

I want to add a new genome to the existing database. So far I had been using the default miniSeq database provided along the software. I have executed step 3 from the steps of construction. But I think it is required for the database to be constructed from scratch for the following step to work. Therefore, I started from the top with step 1 and got the following output. [screenshot included] It says that 0 genomes were downloaded for the bacteria branch. And I wonder why?

Thanks in advance for the help. Best regards Parul

AlexanderDilthey commented 5 years ago

Hi @parul-sharma,

Thank you for submitting this!

Just to clarify, you are using a recent pull from GitHub? I'm not sure I can reproduce the error:

dilthey@hilbert227:dilthey/software/MetaMaps_fresh] perl downloadRefSeq.pl --seqencesOutDirectory download/refseq_parul --taxonomyOutDirectory download/taxonomy_parul --targetBranches bacteria

citations.dmp
delnodes.dmp
division.dmp
gencode.dmp
merged.dmp
names.dmp
nodes.dmp
gc.prt
readme.txt

Taxonomy downloaded and extracted into download/taxonomy_parul

Now download genomes for 24762 bacteria species (refseq).
...

parul-sharma commented 5 years ago

Hi

Thanks for the reply. I updated the software and tried again. I ran into this error.

perl /home/parulsharma/bin/MetaMaps/downloadRefSeq.pl --seqencesOutDirectory Metamaps_dafaultDB+1/refseq --taxonomyOutDirectory Metamaps_dafaultDB+1/taxonomy --targetBranches bacteria 
citations.dmp
delnodes.dmp
division.dmp
gencode.dmp
merged.dmp
names.dmp
nodes.dmp
gc.prt
readme.txt

Taxonomy downloaded and extracted into Metamaps_dafaultDB+1/taxonomy

Now download genomes for 24762 bacteria species (refseq).
     1 / 24762 bacteria (Acaricomes_phytoseiuli) -- version 1 / 1: GET GCF_000376245.1_ASM37624v1_genomic.fna.gz                        Cannot transfer file Idle timeout (60 seconds): closing control connection
Cannot transfer file GCF_000376245.1_ASM37624v1_genomic.fna.gz: No such file or directory
Cannot transfer file GCF_000376245.1_ASM37624v1_genomic.fna.gz: No such file or directory
Net::FTP>>> Net::FTP(3.10)
Net::FTP>>>   Exporter(5.72)
Net::FTP>>>   Net::Cmd(3.10)
Net::FTP>>>   IO::Socket::SSL(2.066)
Net::FTP>>>     IO::Socket::IP(0.38)
Net::FTP>>>       IO::Socket(1.38)
Net::FTP>>>         IO::Handle(1.36)
Net::FTP=GLOB(0x55fc938c9160)<<< 220-
Net::FTP=GLOB(0x55fc938c9160)<<<  This warning banner provides privacy and security notices consistent with 
Net::FTP=GLOB(0x55fc938c9160)<<<  applicable federal laws, directives, and other federal guidance for accessing 
Net::FTP=GLOB(0x55fc938c9160)<<<  this Government system, which includes all devices/storage media attached to 
Net::FTP=GLOB(0x55fc938c9160)<<<  this system. This system is provided for Government-authorized use only. 
Net::FTP=GLOB(0x55fc938c9160)<<<  Unauthorized or improper use of this system is prohibited and may result in 
Net::FTP=GLOB(0x55fc938c9160)<<<  disciplinary action and/or civil and criminal penalties. At any time, and for 
Net::FTP=GLOB(0x55fc938c9160)<<<  any lawful Government purpose, the government may monitor, record, and audit 
Net::FTP=GLOB(0x55fc938c9160)<<<  your system usage and/or intercept, search and seize any communication or data 
Net::FTP=GLOB(0x55fc938c9160)<<<  transiting or stored on this system. Therefore, you have no reasonable 
Net::FTP=GLOB(0x55fc938c9160)<<<  expectation of privacy. Any communication or data transiting or stored on this 
Net::FTP=GLOB(0x55fc938c9160)<<<  system may be disclosed or used for any lawful Government purpose.
Net::FTP=GLOB(0x55fc938c9160)<<< 220 FTP Server ready.
Net::FTP=GLOB(0x55fc938c9160)>>> USER anonymous
Net::FTP=GLOB(0x55fc938c9160)<<< 331 Anonymous login ok, send your complete email address as your password
Net::FTP=GLOB(0x55fc938c9160)>>> PASS ....
Net::FTP=GLOB(0x55fc938c9160)<<< 230 Anonymous access granted, restrictions apply
Net::FTP=GLOB(0x55fc938c9160)>>> TYPE I
Net::FTP=GLOB(0x55fc938c9160)<<< 200 Type set to I
Attempt 2 to get GCF_000376245.1_ASM37624v1_genomic.fna.gz failed at /home/parulsharma/bin/MetaMaps/downloadRefSeq.pl line 178.

Sorry I'm new at this and I really appreciate your help. Thanks

parul-sharma commented 5 years ago

So I loaded the dependencies (zlib, boost, python) again and it worked this time. But it throws an error after 67/24762 as shown below. I have tried it 3 times again but it shows the same error.

I feel like this has something to do with the NCBI server issues but is there a way to resolve this?

This is the error:

65 / 24762 bacteria (Acidovorax_sp._KKS102) -- version 1 / 1: GET GCF_000302535.1_ASM30253v1_protein.faa.gz                 
65 / 24762 bacteria (Acidovorax_sp._KKS102) -- version 1 / 1: GET GCF_000302535.1_ASM30253v1_assembly_report.txt            
66 / 24762 bacteria (Acidovorax_sp._MR-S7) -- version 1 / 1: GET GCF_000400995.2_ASM40099v2_genomic.fna.gz                  
66 / 24762 bacteria (Acidovorax_sp._MR-S7) -- version 1 / 1: GET GCF_000400995.2_ASM40099v2_genomic.gff.gz                  
66 / 24762 bacteria (Acidovorax_sp._MR-S7) -- version 1 / 1: GET GCF_000400995.2_ASM40099v2_protein.faa.gz                 
66 / 24762 bacteria (Acidovorax_sp._MR-S7) -- version 1 / 1: GET GCF_000400995.2_ASM40099v2_assembly_report.txt             
67 / 24762 bacteria (Acidovorax_sp._NO-1) -- version 1 / 1: GET GCF_000238595.1_ASM23859v2_genomic.fna.gz                   
67 / 24762 bacteria (Acidovorax_sp._NO-1) -- version 1 / 1: GET GCF_000238595.1_ASM23859v2_genomic.gff.gz                   
67 / 24762 bacteria (Acidovorax_sp._NO-1) -- version 1 / 1: GET GCF_000238595.1_ASM23859v2_protein.faa.gz                   
67 / 24762 bacteria (Acidovorax_sp._NO-1) -- version 1 / 1: GET GCF_000238595.1_ASM23859v2_assembly_report.txt                       
Cannot change working directory (assembly_version) README.txt: No such file or directory

AlexanderDilthey commented 5 years ago

@parul-sharma OK, thank you! I think I can reproduce this error. I'll fix this ASAP, but it might take until next week.

AlexanderDilthey commented 5 years ago

This should be fixed now. You can also call the script with --skipIncompleteGenomes 1 if you want to speed up the process by downloading only complete genomes.

DiltheyLab / MetaMaps

Building database and adding new genomes. #14