Open LankyCyril opened 4 years ago
I fixed the error by changing ftp to https in one line of downloadRefSeq.pl.
Original: (my $assembly_path_FTP = $assembly_path_fullURL) =~ s/ftp:\/\/ftp.ncbi.nlm.nih.gov//g;
New: (my $assembly_path_FTP = $assembly_path_fullURL) =~ s/https:\/\/ftp.ncbi.nlm.nih.gov//g;
I added a conditional statement in there that iterates to the next species if $assembly_path_fullURL == "na" - that's why that error was being thrown. I used the following sed command to insert the logic:
sed -i 's|# last SPECIES if($downloaded_assemblies > 100);|if($assembly_path_fullURL eq "na"){\n\t\t\t\tnext SPECIES; \n\t\t\t}\n|g' ./downloadRefSeq.pl
This will replace this comment line # last SPECIES if($downloaded_assemblies > 100);
with the following if statement:
if($assembly_path_fullURL eq "na"){ next SPECIES; }
Keep in mind that if there is an update to MetaMaps and the # last SPECIES if($downloaded_assemblies > 100);
comment is removed, this sed statement won't work
Hi. I run the downloadRefSeq.pl command --
downloadRefSeq.pl --seqencesOutDirectory data/metamaps-db/refseq --taxonomyOutDirectory data/metamaps-db/taxonomy
, and after about two days of churning data and printing progress output, it just failed with "Cannot change working directory into assembly path na na: No such file or directory" and no other explanation. It had successfully processed all bacterial genomes but only got through 5 out of 323 fungal genomes. Looking into thedata/metamaps-db/refseq/fungi
dir, I actually see only six subdirectories for six species.assembly_summary.txt
lists a lot more. I have about 20TB free disk space left, so it can't be that.Does it mean that some previous data retrieval steps failed? Is there a way to safeguard against this? Or fix it and resume from where it left off?