DiltheyLab / MetaMaps

Long-read metagenomic analysis
Other
98 stars 23 forks source link

no file taxdump.tar.gz with downloadRefSeq.pl #10

Closed aroelo closed 5 years ago

aroelo commented 5 years ago

When trying to download a reference database with the downloadRefSeq.pl script I get the following error: No file taxdump.tar.gz in /pub/taxonomy/ on ftp.ncbi.nlm.nih.gov? at /opt/MetaMaps-master/downloadRefSeq.pl line 70.

Parameters used: perl /opt/MetaMaps-master/downloadRefSeq.pl --seqencesOutDirectory /datadrivehdd/MetaMaps_DB/refseq --taxonomyOutDirectory /datadrivehdd/MetaMaps_DB/taxonomy --targetBranches viral

The taxdump.tar.gz file is still available on ftp.ncbi.nlm.nih.gov/pub/taxonomy/, any idea what is going wrong here?

AlexanderDilthey commented 5 years ago

I can't reproduce the error - maybe a temporary NCBI server hiccup? Have you tried re-running the command?

aroelo commented 5 years ago

I tried re-running the same command today and still get the same error. When I download the taxdump.tar.gz manually to the taxonomy folder and comment out line 70, it continues, but downloads 0 genomes. It results in the following output:

citations.dmp
delnodes.dmp
division.dmp
gencode.dmp
merged.dmp
names.dmp
nodes.dmp
gc.prt
readme.txt

Taxonomy downloaded and extracted into /datadrivehdd1/taxonomy

Now download genomes for 0 viral species (refseq).

Summary for viral:
    Downloaded species: 0 
    Skipped species (most likely because there is no 'latest_assembly_versions' link in species directory): 0 
    Downloaded assemblies: 0

Download for refseq complete. Have 0 assemblies.

Download successful - output directories:
- (sequences)  /datadrivehdd1/refseq
- (taxonomy)   /datadrivehdd1/taxonomy

Suggested command for next step:

perl /opt/MetaMaps-master/annotateRefSeqSequencesWithUniqueTaxonIDs.pl --refSeqDirectory /datadrivehdd1/refseq --taxonomyInDirectory /datadrivehdd1/taxonomy --taxonomyOutDirectory DIR

I'm thinking, because of a similar problem with the ftp connection as above, but can't figure out what exactly the problem is.

AlexanderDilthey commented 5 years ago

Very weird indeed!

I agree that this is probably the same problem - i.e. if we manage to get the taxdump.tar.gz downloaded correctly, the other parts of the script will probably work as well.

Could you do two things:

For debugging, I'd re-enable line 70, and add an exit; after line 76.

aroelo commented 5 years ago

Adding $ftp->passive(1); did the trick!

For your info, without $ftp->passive(1); the output is:

Net::FTP>>> Net::FTP(3.11)
Net::FTP>>>   Exporter(5.72)
Net::FTP>>>   Net::Cmd(3.11)
Net::FTP>>>   IO::Socket::SSL(2.024)
Net::FTP>>>     IO::Socket::IP(0.37)
Net::FTP>>>       IO::Socket(1.38)
Net::FTP>>>         IO::Handle(1.35)
Net::FTP=GLOB(0x26e13e0)<<< 220-
Net::FTP=GLOB(0x26e13e0)<<<  This warning banner provides privacy and security notices consistent with 
Net::FTP=GLOB(0x26e13e0)<<<  applicable federal laws, directives, and other federal guidance for accessing 
Net::FTP=GLOB(0x26e13e0)<<<  this Government system, which includes all devices/storage media attached to 
Net::FTP=GLOB(0x26e13e0)<<<  this system. This system is provided for Government-authorized use only. 
Net::FTP=GLOB(0x26e13e0)<<<  Unauthorized or improper use of this system is prohibited and may result in 
Net::FTP=GLOB(0x26e13e0)<<<  disciplinary action and/or civil and criminal penalties. At any time, and for 
Net::FTP=GLOB(0x26e13e0)<<<  any lawful Government purpose, the government may monitor, record, and audit 
Net::FTP=GLOB(0x26e13e0)<<<  your system usage and/or intercept, search and seize any communication or data 
Net::FTP=GLOB(0x26e13e0)<<<  transiting or stored on this system. Therefore, you have no reasonable 
Net::FTP=GLOB(0x26e13e0)<<<  expectation of privacy. Any communication or data transiting or stored on this 
Net::FTP=GLOB(0x26e13e0)<<<  system may be disclosed or used for any lawful Government purpose.
Net::FTP=GLOB(0x26e13e0)<<< 220 FTP Server ready.
Net::FTP=GLOB(0x26e13e0)>>> USER anonymous
Net::FTP=GLOB(0x26e13e0)<<< 331 Anonymous login ok, send your complete email address as your password
Net::FTP=GLOB(0x26e13e0)>>> PASS ....
Net::FTP=GLOB(0x26e13e0)<<< 230 Anonymous access granted, restrictions apply
Net::FTP=GLOB(0x26e13e0)>>> TYPE I
Net::FTP=GLOB(0x26e13e0)<<< 200 Type set to I
Net::FTP=GLOB(0x26e13e0)>>> CWD /pub/taxonomy/
Net::FTP=GLOB(0x26e13e0)<<< 250 CWD command successful
Net::FTP=GLOB(0x26e13e0)>>> PORT 172,27,26,5,136,203
Net::FTP=GLOB(0x26e13e0)<<< 500 Illegal PORT command
No file taxdump.tar.gz in /pub/taxonomy/ on ftp.ncbi.nlm.nih.gov? at /opt/MetaMaps-master/downloadRefSeq.pl line 70.

After adding $ftp->passive(1); the output is:

Net::FTP>>> Net::FTP(3.11)
Net::FTP>>>   Exporter(5.72)
Net::FTP>>>   Net::Cmd(3.11)
Net::FTP>>>   IO::Socket::SSL(2.024)
Net::FTP>>>     IO::Socket::IP(0.37)
Net::FTP>>>       IO::Socket(1.38)
Net::FTP>>>         IO::Handle(1.35)
Net::FTP=GLOB(0x33e43a8)<<< 220-
Net::FTP=GLOB(0x33e43a8)<<<  This warning banner provides privacy and security notices consistent with 
Net::FTP=GLOB(0x33e43a8)<<<  applicable federal laws, directives, and other federal guidance for accessing 
Net::FTP=GLOB(0x33e43a8)<<<  this Government system, which includes all devices/storage media attached to 
Net::FTP=GLOB(0x33e43a8)<<<  this system. This system is provided for Government-authorized use only. 
Net::FTP=GLOB(0x33e43a8)<<<  Unauthorized or improper use of this system is prohibited and may result in 
Net::FTP=GLOB(0x33e43a8)<<<  disciplinary action and/or civil and criminal penalties. At any time, and for 
Net::FTP=GLOB(0x33e43a8)<<<  any lawful Government purpose, the government may monitor, record, and audit 
Net::FTP=GLOB(0x33e43a8)<<<  your system usage and/or intercept, search and seize any communication or data 
Net::FTP=GLOB(0x33e43a8)<<<  transiting or stored on this system. Therefore, you have no reasonable 
Net::FTP=GLOB(0x33e43a8)<<<  expectation of privacy. Any communication or data transiting or stored on this 
Net::FTP=GLOB(0x33e43a8)<<<  system may be disclosed or used for any lawful Government purpose.
Net::FTP=GLOB(0x33e43a8)<<< 220 FTP Server ready.
Net::FTP=GLOB(0x33e43a8)>>> USER anonymous
Net::FTP=GLOB(0x33e43a8)<<< 331 Anonymous login ok, send your complete email address as your password
Net::FTP=GLOB(0x33e43a8)>>> PASS ....
Net::FTP=GLOB(0x33e43a8)<<< 230 Anonymous access granted, restrictions apply
Net::FTP=GLOB(0x33e43a8)>>> TYPE I
Net::FTP=GLOB(0x33e43a8)<<< 200 Type set to I
Net::FTP=GLOB(0x33e43a8)>>> CWD /pub/taxonomy/
Net::FTP=GLOB(0x33e43a8)<<< 250 CWD command successful
Net::FTP=GLOB(0x33e43a8)>>> PASV
Net::FTP=GLOB(0x33e43a8)<<< 227 Entering Passive Mode (130,14,250,12,196,170).
Net::FTP=GLOB(0x33e43a8)>>> NLST
Net::FTP=GLOB(0x33e43a8)<<< 150 Opening BINARY mode data connection for file list
Net::FTP=GLOB(0x33e43a8)<<< 226 Transfer complete
Net::FTP=GLOB(0x33e43a8)>>> PASV
Net::FTP=GLOB(0x33e43a8)<<< 227 Entering Passive Mode (130,14,250,12,195,142).
Net::FTP=GLOB(0x33e43a8)>>> RETR taxdump.tar.gz
Net::FTP=GLOB(0x33e43a8)<<< 150 Opening BINARY mode data connection for taxdump.tar.gz (46798299 bytes)
Net::FTP=GLOB(0x33e43a8)<<< 226 Transfer complete

And it continues with downloading of the data.. Thank you for your help!