Open bloomarun opened 11 months ago
@bloomarun A quick/short-term solution for me was runnng the microbeannotator_db_builder
until Step 6 (Download TrEMBL Proteins) and then cancelling during the download via keyboard interrupt. I then manually downloaded TrEMBL protein (Step 6) and TrEMBL Annotations (Step 7) using aria2c, but any multi-threaded downloader tool should work.
Example code:
#Manual Step 6 download for TrEMBL proteins:
aria2c -x 16 -s 16 https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
#Manual Step 7 download for TrEMBL annotations:
aria2c -x 16 -s 16 https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
The TrEMBL proteins file needs to be saved in the "protein_db" directory as "uniprot_trembl.fasta" (make the "protein_db" directory if it's not already present in the main microbeannotator_db_dir directory). Example:
~/MicrobeAnnotator_DB/protein_db/uniprot_trembl.fasta
The TrEMBL annotations file needs to be saved in the "temp_trembl_dat_files" directory as "uniprot_trembl.dat.gz" (make the "temp_trembl_dat_files" directory if it's not already present in the main microbeannotator_db directory). Example
~/MicrobeAnnotator_DB/temp_trembl_dat_files/uniprot_trembl.dat.gz
After doing this, resume the microbeannotator_db_builder script starting at Step 8 (Parse TrEMBL Annotations) as follows:
microbeannotator_db_builder -d MicrobeAnnotator_DB -m diamond -t 22 --step 8
This allowed me to cut my download time from ~72 hour to ~3 hours. The rest of the download steps are neglible in comparison so I didn't bother with multi-threading but I imagine the same could be done.
Hope this helps speed things up for you!
Hello, The db_builder provided is great, but it is pretty slow. I want to download all the databases manually (using multi threaded download tools like axel) and then build them using the builder script. How can I go about that? I am missing out on the best annotator out there due to small glitches like these...