Extremely slow on cluster

Hi there,

I'm having an issue where it's taking ~4 days to dereplicate 1500 bacterial assemblies. I have many batches consisting of these ~1500 assemblies so overall this is going to take way too long. Given your knowledge of the different programs run by dRep and their efficiencies, I'm wondering whether you could offer any advice for optimizing the cluster jobs that I am submitting? Here are the parameters I am currently working with:

# Number of nodes and MPI tasks per node:
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
# Enable Hyperthreading:
#SBATCH --ntasks-per-core=2
# for OpenMP:
#SBATCH --cpus-per-task=20

Do you have any suggestions for adjustments that might be specifically optimal for dRep?

Here's my dRep command:

dRep dereplicate \
        --processors 40 \
        --genomes "${input_dir}/${genome}.txt" \
        --genomeInfo "${genome_info_dir}/${genome}.csv" \
        --completeness 50 \
        --contamination 10 \
        --S_algorithm ANImf \
        --S_ani 0.95 \
        --run_tertiary_clustering \
        --SkipMash \
        --cov_thresh 0.4 \
        "${output_dir}/${genome}"

Many thanks

MrOlm / drep

Extremely slow on cluster #215