BenoitMorel / ParGenes

A massively parallel tool for model selection and tree inference on thousands of genes
GNU General Public License v3.0
42 stars 6 forks source link

ParGenes run failed #71

Closed leke-lyu closed 2 years ago

leke-lyu commented 2 years ago

I launch the run on cluster, and I got this:

ParGenes report file for run pargenes_output


[REPORT] MainLogs


########################

PARGENES v1.2.0

########################

ParGenes was called as follow: /apps/eb/ParGenes/20220329-foss-2020b-Python-3.8.6-Java-1.8/pargenes/pargenes-hpc.py -a output -o pargenes_output -r raxml_options.txt --seed 3000 -s 0 -p 10 -b 0 -d nt -c 10 --scheduler split

[0:00:00] end of MSAs initializations Calling mpi-scheduler: mpiexec -n 10 /apps/eb/ParGenes/20220329-foss-2020b-Python-3.8.6-Java-1.8/pargenes/pargenes_src/../pargenes_binaries/mpi-scheduler --split-scheduler 10 /apps/eb/ParGenes/20220329-foss-2020b-Python-3.8.6-Java-1.8/pargenes/pargenes_src/../pargenes_binaries/raxml-ng-mpi.so pargenes_output/parse_run/parse_command.txt pargenes_output/parse_run Logs will be redirected to pargenes_output/parse_run/logs.txt [Error] [0:00:00] mpi-scheduler execution failed with error code 1 [Error] [0:00:00] Will now exit... [Error] <class 'RuntimeError'> mpi-scheduler execution failed with error code 1 Writing report file in /home/ll22780/tipTraitAssociation/covid19_cme_analysis-master/myData/pargenes_output/report.txt When reporting the issue, please always send us this file.

report.txt

leke-lyu commented 2 years ago

Here I also attach my bash script to submit job on the slurm system:

!/bin/bash

SBATCH --ntasks=1

SBATCH --cpus-per-task=60

SBATCH --mem=60gb

SBATCH --time=72:00:00

SBATCH --output=%j.out

SBATCH --error=%j.err

cd $SLURM_SUBMIT_DIR module load ParGenes/20220329-foss-2020b-Python-3.8.6-Java-1.8 python /apps/eb/ParGenes/20220329-foss-2020b-Python-3.8.6-Java-1.8/pargenes/pargenes-hpc.py -a output -o pargenes_output -r raxml_options.txt --seed 3000 -s 0 -p 10 -b 0 -d nt -c 10

Hope u could help!

BenoitMorel commented 2 years ago

Dear leke-lyu,

Apparently there is a problem with the number of cores/slots you request with the -c option. See line 28 in the report file, this error message comes from MPI:

There are not enough slots available in the system to satisfy the 10
slots that were requested by the application:

Could it be that your submission script does not allocate enough slots? I am not very experienced with slurm, so I can't tell what could be wrong in yours. But the submission scripts I use look like this:

#SBATCH -B 2:8:1
#SBATCH -N 32   # because our cluster has 16 cores per node, and 512/16=32
#SBATCH -n 512
#SBATCH --threads-per-core=1
#SBATCH --cpus-per-task=1
#SBATCH --hint=compute_bound
#SBATCH -t 24:00:00

I would not copy paste it because all clusters have different configurations, but maybe this helps a bit.

If you want to make sure that the problem is the script (and not ParGenes), you can replace your ParGenes call with:

 mpiexec -np 10 echo "hello"

This should print hello 10 times if the script is correct. But I would expect the same error message as the one you got in the report.

Let me know if this helps ;-) Benoit

leke-lyu commented 2 years ago

Thank you Benoit, The issue has been solved!

BenoitMorel commented 2 years ago

Great, thanks for the feedback ;)