WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
239 stars 50 forks source link

Thread usage and computational efficiency #300

Open ganiatgithub opened 9 months ago

ganiatgithub commented 9 months ago

Hi,

I'm to check my usage of DRAM is appropriate, in terms of maximizing computational efficiency.

I specified using 32 threads for annotate on HPC, and I used htop to inspect thread usage. During 15 minutes of inspection, for the majority the of time, significantly less is resource is being used.

hmmsearch

Only when mmseqs is running, all 32 threads are being used. mmseqs

Here is my script, is there way to optimize it at this stage?

source /fs04/rp24/gaofeng/tools/mambaforge/etc/profile.d/mamba.sh
source /fs04/rp24/gaofeng/tools/mambaforge/etc/profile.d/conda.sh
mamba activate DRAM

MAG_DIR="/home/gnii0001/07_dsm/result/drep_merged_from_10/dereplicated_genomes"
DRAM_OUT_DIR="/home/gnii0001/07_dsm/result/DRAM_dereplicated"

function create_directories () {
    mkdir -p "$DRAM_OUT_DIR"
}

function annotate () {
    time DRAM.py annotate -i "$MAG_DIR/*" -o "$DRAM_OUT_DIR"/annotation --threads 32
}

function distill () {
    time DRAM.py distill -i "$DRAM_OUT_DIR"/annotation/annotations.tsv -o "$DRAM_OUT_DIR"/genome_summaries --trna_path "$DRAM_OUT_DIR"/annotation/trnas.tsv --rrna_path "$DRAM_OUT_DIR"/annotation/rrnas.tsv
}
create_directories
annotate
distill

Many thanks!