TACC / launcher

A simple utility for executing multiple sequential or multi-threaded applications in a single multi-node batch job
MIT License
63 stars 32 forks source link

WARNING: No response from dynamic task server. Retrying AND he SLURM resource allocator expects to find the following environment variables #50

Closed luisgithub269 closed 5 years ago

luisgithub269 commented 5 years ago

Hi, everyone

I have the next problem.


While trying to determine what resources are available, the SLURM resource allocator expects to find the following environment variables:


However, it was unable to find the following environment variable:


[file orca_main/gtoint.cpp, line 137]: ORCA finished by error termination in ORCA_GTOInt

Launcher: Setup complete.

------------- SUMMARY --------------- Number of hosts: 2 Working directory: /scratch/luis_sc3orca/ligandos1/ligandos Processes per host: 2 Total processes: 4 Total jobs: 3 Scheduling method: dynamic

Launcher: Starting parallel tasks... Launcher: Task 0 running job 1 on guane14 (srun -N 1 -n 12 ./ejecutar1.sh) Launcher: Task 1 running job 2 on guane14 (# ./ejecutar2.sh) Launcher: Job 2 completed in 0 seconds. Launcher: Task 1 running job 3 on guane14 (# ./ejecutar3.sh) srun: error: Unable to create step for job 114446: More processors requested than permitted Launcher: Job 3 completed in 0 seconds. Launcher: Job 1 completed in 0 seconds. Launcher: Task 1 done. Exiting. Launcher: Task 0 done. Exiting. localhost [] 9471 (?) : Connection refused /home/luis_sc3/plantatrabajo/launcher/launcher: line 82: [: -gt: unary operator expected localhost [] 9471 (?) : Connection refused /home/luis_sc3/plantatrabajo/launcher/launcher: line 82: [: -gt: unary operator expected localhost [] 9471 (?)localhost [] 9471 (?) : Connection refused : Connection refused

the scritp master


--------SCHEDULER OPTIONS-------

SBATCH -J parametric



SBATCH -p manycores24

SBATCH -w=guane[09-11]

SBATCH -o parametric.o%j

--------GENERAL OPTIONS---------

export LAUNCHER_DIR=/home/luis_sc3/plantatrabajo/launcher #pwd direcccion de launcher export PATH=$LAUNCHER_DIR:$PATH export PATH=/usr/bin/python2.7:$PATH export PATH=/usr/lib/python2.7:$PATH

export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins #administrador de trabajos

export LAUNCHER_PPN=2 # de trabajos por nodo export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins #administrador de trabajos export LAUNCHER_RMI=SLURM export EXECUTABLE=$LAUNCHER_DIR/init_laucher export LAUNCHER_WORKDIR=$PWD #pwd_of_jobfile export LAUNCHER_JOB_FILE=ejecucion #jobfile


export LAUNCHER_SCHED=dynamic



the script at jobfile ./ejecutar1.sh ./ejecutar2.sh ./ejecutar3.sh

the script at ejecutar1.sh


Grupo de nodos a utilizar

SBATCH --partition=manycores24

nombre del trabajo


nombre del archivo de salida

SBATCH -o l1.%j.out

numero de nodos

SBATCH --nodes=1

numero de tareas

SBATCH --ntasks=12

numero de tasks por nodo

SBATCH --tasks-per-node=12

numero de tasks por cpu

SBATCH --cpus-per-task=1

tiempo habilitado para la ejeucucion

SBATCH --time=120:00:00

memoria asignada para cada cpu

SBATCH --mem-per-cpu=8G

solicitud de uso de memoria sin limites

ulimit -l unlimited

variables de openmpi-2.0.2

export PATH=/usr/local/openmpi-2.0.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/openmpi-2.0.2/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/openmpi-2.0.2/openmpi:$LD_LIBRARY_PATH

variable para llamar a orca

export ORCA_PATH=/home/luis_sc3/plantatrabajo/orca4012 export PATH=$ORCA_PATH:$PATH

variable para llamar direccion del archivo

export FILE_PATH=/scratch/luis_sc3orca/ligandos1/ligandos/l1

$ORCA_PATH/orca $FILE_PATH/l1.inp > $FILE_PATH/l1.out

Any ideas how to fix the problem?

1 2 3 4 5 6 7 8 9

luisgithub269 commented 5 years ago

Hi, everyone

I need your help plis

The main reason why wrote this mesagge is because i need run a simulation at 500 files each one 5hrs and the resources assigns are 4 nodes, and each node contains 24 cores which allows me to execute 8 tasks simultaneously (two jobs per node, each jobs assign=12 ntasks ) reducing the execution time to 6 days. by

The second reason is the next week I have to deliver my thesis to be able to graduate at chemical engineering at the universidad industrial de santander, colombia

excuse me my writing, I know little about english

The programs used are: orca_4_0_1_2_linux_x86-64_openmpi202.tar.xz openmpi-2.0.2.tar.gz LAUNCHER-TACC version github

I have the next problem with LAUNCHER

the main problen is how assign the resources what the launcher (SCRIPTLAUNCHER.SH) access to transmit a the file (EJECUTAR1.SH) by assign file input ORCA, for fix solution the SLURM resource allocator expects to find the following environment variables:


However, it was unable to find the following environment variable:


when i`m run the program show the next message


the output at simulation at ORCA (QUANTUNN CHEMICAL)

While trying to determine what resources are available, the SLURM resource allocator expects to find the following environment variables:


However, it was unable to find the following environment variable:


[file orca_main/gtoint.cpp, line 137]: ORCA finished by error termination in OR$



the input at ejecutar1.sh


Grupo de nodos a utilizar

SBATCH --partition=manycores24

nombre del trabajo


nombre del archivo de salida

SBATCH -o l1.%j.out

numero de nodos

SBATCH --nodes=1

numero de tareas

SBATCH --ntasks=12

numero de tasks por nodo

SBATCH --tasks-per-node=12

numero de tasks por cpu

SBATCH --cpus-per-task=1

tiempo habilitado para la ejeucucion

SBATCH --time=120:00:00

memoria asignada para cada cpu

SBATCH --mem-per-cpu=8G

solicitud de uso de memoria sin limites

ulimit -l unlimited

variables de openmpi-2.0.2

export PATH=/usr/local/openmpi-2.0.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/openmpi-2.0.2/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/openmpi-2.0.2/openmpi:$LD_LIBRARY_PATH

variable para llamar a orca

export ORCA_PATH=/home/luis_sc3/plantatrabajo/orca4012 export PATH=$ORCA_PATH:$PATH

variable para llamar direccion del archivo

export FILE_PATH=/scratch/luis_sc3orca/ligandos1/ligandos/l1

$ORCA_PATH/orca $FILE_PATH/l1.inp > $FILE_PATH/l1.out



the input at jobfile=ejecucion ./ejecutar1.sh ./ejecutar2.sh ./ejecutar3.sh



the input at simulation at ORCA (QUANTUNN CHEMICAL)


l1-Zn Opt



%pal nprocs 12 end

%MaxCore 8000


in the ouput at launcher

Launcher: Setup complete.

------------- SUMMARY --------------- Number of hosts: 2 Working directory: /scratch/luis_sc3orca/ligandos1/ligandos Processes per host: 2 Total processes: 4 Total jobs: 3 Scheduling method: dynamic

Launcher: Starting parallel tasks... Launcher: Task 0 running job 1 on guane14 (./ejecutar1.sh) Launcher: Task 1 running job 2 on guane14 (#./ejecutar2.sh) Launcher: Job 2 completed in 0 seconds. Launcher: Task 1 running job 3 on guane14 (#./ejecutar3.sh) Launcher: Job 3 completed in 0 seconds. Launcher: Task 1 done. Exiting. [guane14:21063] [[16989,0],0] ORTE_ERROR_LOG: Not found in file base/ras_base_a$ [file orca_main/gtoint.cpp, line 137]: ORCA finished by error termination in OR$

Launcher: Job 1 completed in 0 seconds. Launcher: Task 0 done. Exiting. localhost [] 9471 (?) : Connection refused localhost [] 9471 (?) : Connection refused WARNING: No response from dynamic task server. Retrying... localhost [] 9471 (?) : Connection refused localhost [] 9471 (?) : Connection refused WARNING: No response from dynamic task server. Retrying... .. . . . .



input at launcher


echo "Press CTRL+C to proceed."

trap "pkill -f 'sleep 1h'" INT

trap "set +x ; sleep 1h ; set -x" DEBUG

--------SCHEDULER OPTIONS-------

SBATCH -J parametric



SBATCH -p manycores24

SBATCH -w "guane03"

SBATCH -o parametric.o%j

--------GENERAL OPTIONS---------

export LAUNCHER_DIR=/home/luis_sc3/plantatrabajo/launcher #pwd direcccion de la$ export PATH=$LAUNCHER_DIR:$PATH export PATH=/usr/bin/python2.7:$PATH export PATH=/usr/lib/python2.7:$PATH

export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins #administrador de trabajos




export LAUNCHER_PPN=2 # de trabajos por nodo export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins #administrador de trabajos export LAUNCHER_RMI=SLURM

export EXECUTABLE=$LAUNCHER_DIR/init_laucher

export LAUNCHER_WORKDIR=$PWD #pwd_of_jobfile export LAUNCHER_JOB_FILE=ejecucion #jobfile

export CONTROL_FILE = #jobfile


export LAUNCHER_SCHED=dynamic




lwilson commented 5 years ago

I'm going to close this one since it's a duplicate of #51. @luisgithub269 if I'm mistaken please feel free to let me know and I'll re-open it.