bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.02k stars 182 forks source link

No queries aligned when running diamond in distributed mode #681

Open davidecarlson opened 1 year ago

davidecarlson commented 1 year ago

I'm trying to do an all-against-all blastp search of several proteomes. When running this with diamond on a single node, there are many hits as would be expected. However, when I run diamond in distributed mode across multiple nodes, the log reports that zero queries are aligned and the results file is empty.

I'm not sure if this is a bug or I am doing something incorrectly, so I would love any feedback.

Below are the steps I'm running

  1. First, I run the mp-init step on the login node:
export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH

PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp

QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd

diamond blastp --query ${QUERY} --db ${DB} --multiprocessing --mp-init --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}
  1. Next I submit a batch job to the scheduler (from the same working directory as step 1):
#!/usr/bin/env bash

#SBATCH --job-name=diamond_gcc
#SBATCH --output=diamond_gcc_refseq_fungi_12node.log
#SBATCH -N 12
#SBATCH --time=08:00:00
#SBATCH --ntasks-per-node=1
#SBATCH -p medium-24core

module load gcc/12.1.0

export PATH=/gpfs/software/diamond/gcc12/2.1.4/bin:$PATH

OUTPUT=diamond_gcc_fungi_${SLURM_NNODES}_nodes

PREFIX=gcc_fungi_12node_test
DIAMOND_TEMP=`pwd`/diamond_temp_${PREFIX}
TEMP=/tmp

QUERY=fungi_combined.protein.faa
DB=refseq_fungi.dmnd

# run the search step

srun diamond blastp --db ${DB} --query ${QUERY} -o ${OUTPUT} --multiprocessing --tmpdir ${TEMP} --parallel-tmpdir ${DIAMOND_TEMP}

Note that this is Diamond v2.1.4 compiled with the GCC 12.1.0 compiler. I've tried doing this with various numbers of nodes, and placing the TEMP directory both within and outside of the parallel file system, and even compiled it on multiple different clusters with different compilers, but I still consistently get zero queries aligned from the all-by-all blastp when running in distributed mode (but not when running on a single node).

I've also attached my log file.

Do you see anything that I'm doing wrong or otherwise have any advice for getting diamond to work in distributed mode?

Thanks! Dave

diamond_gcc_refseq_fungi_12node.log

bbuchfink commented 1 year ago

I can't reproduce the problem using v2.1.4. Could you run this again using the --log option and show me the output?

davidecarlson commented 1 year ago

Thanks for looking into this. I've rerun with --log and am attaching the output log file. diamond.log

bbuchfink commented 1 year ago

The only thing I noticed about these logs is that your calls with --mp-init don't have a block size parameter while the others do, that could be a problem.