Want to download all the required databases manually

@bloomarun A quick/short-term solution for me was runnng the microbeannotator_db_builder until Step 6 (Download TrEMBL Proteins) and then cancelling during the download via keyboard interrupt. I then manually downloaded TrEMBL protein (Step 6) and TrEMBL Annotations (Step 7) using aria2c, but any multi-threaded downloader tool should work.

Example code:

#Manual Step 6 download for TrEMBL proteins:
aria2c -x 16 -s 16 https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz

#Manual Step 7 download for TrEMBL annotations:
aria2c -x 16 -s 16 https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz

The TrEMBL proteins file needs to be saved in the "protein_db" directory as "uniprot_trembl.fasta" (make the "protein_db" directory if it's not already present in the main microbeannotator_db_dir directory). Example:

~/MicrobeAnnotator_DB/protein_db/uniprot_trembl.fasta

The TrEMBL annotations file needs to be saved in the "temp_trembl_dat_files" directory as "uniprot_trembl.dat.gz" (make the "temp_trembl_dat_files" directory if it's not already present in the main microbeannotator_db directory). Example

~/MicrobeAnnotator_DB/temp_trembl_dat_files/uniprot_trembl.dat.gz

After doing this, resume the microbeannotator_db_builder script starting at Step 8 (Parse TrEMBL Annotations) as follows:

microbeannotator_db_builder -d MicrobeAnnotator_DB -m diamond -t 22 --step 8

This allowed me to cut my download time from ~72 hour to ~3 hours. The rest of the download steps are neglible in comparison so I didn't bother with multi-threading but I imagine the same could be done.

Hope this helps speed things up for you!

cruizperez / MicrobeAnnotator

Want to download all the required databases manually #84