Open rafwiewiora opened 8 years ago
Ok I did a trial MPI run and it's not all quite right :/
I did 2 CPU threads and 2 GPUs:shared. Log is here - you can see it appears like there are two processes running everything in parallel, rather than dividing work in half - there's even a time where one deletes a file, and the other one complains when it tries the same next. Log: https://gist.github.com/rafwiewiora/b7f663ee76059ea8aca22b42347cafce (killed it at the start of explicit refinement - hanged).
I did:
Ensembler script:
# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0
ensembler refine_implicit --gpupn 4
ensembler solvate
ensembler refine_explicit --gpupn 4
MPI script:
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# add anaconda to PATH
# export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
# Set CUDA_VISIBLE_DEVICES for this process
python build-mpirun-configfile.py bash ensembler_script.sh
# Launch MPI job.
mpirun -configfile configfile
This uses the build-mpirun-configfile.py
from https://github.com/choderalab/clusterutils/blob/master/clusterutils/build_mpirun_configfile.py
Any pointers?
Actually had --gpupn 4
's in Ensembler script, but only 2 GPUs on the job. Let me see what happens when I get that right.
Yep, same thing.
You can't use mpirun
to launch a shell script. You have to launch the executable directly. Change your qsub script to this:
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0
# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_implicit --gpupn 4
mpirun -configfile configfile
ensembler solvate
# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_explicit --gpupn 4
mpirun -configfile configfile
This parallelizes the two refinement steps.
Hopefully @danielparton can jump in if I've made a mistake.
Also, you might as well grab 4 GPUs by changing
#PBS -l nodes=1:ppn=2:gpus=2:shared
to
#PBS -l nodes=1:ppn=4:gpus=4:shared
Oh I see, thanks @jchodera! An hour wait time for 4GPUs last time I checked, so testing with 2 for now. (still need to work out the manual PDB before this is all ready to run right)
Same thing - still running duplicates of the same thing at refine_implicit
, this happens:
Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
etc. And it seems like they're both on the same GPU?
Can you post the complete queue submission script from this last attempt?
Sure thing:
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0
# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_implicit --gpupn 2
mpirun -configfile configfile
ensembler solvate
# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_explicit --gpupn 2
mpirun -configfile configfile
and I'm doing qsub mpi_script
I should also mention that I've created .dontsetcudavisibledevices
file in my home dir, as hal wiki suggests.
conda install --yes clusterutils
should install a script called build_mpirun_configfile
. ensembler
already installs this as a dependency, I think. I believe you should be using this script instead of This is python build-mpirun-configfile.py
.
# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
# ppn: number of processes per node
# gpus: number of gpus per node
# GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput
# Change to working directory used for job submission
cd $PBS_O_WORKDIR
# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0
# Parallelize refinement
build_mpirun_configfile ensembler refine_implicit --gpupn 2
mpirun -configfile configfile
ensembler solvate
# Parallelize refinement
build_mpirun_configfile ensembler refine_explicit --gpupn 2
mpirun -configfile configfile
That worked for me:
[chodera@mskcc-ln1 ~/debug-ensembler]$ cat ~/ensembler.o7068007
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
gpu-2-7-gpu1
gpu-2-7-gpu0
-hosts gpu-2-7:1,gpu-2-7:1 -np 1 -env CUDA_VISIBLE_DEVICES 1 ensembler refine_implicit --gpupn 2 : -np 1 -env CUDA_VISIBLE_DEVICES 0 ensembler refine_implicit --gpupn 2Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_2BQZ_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_2BQZ_E in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Y_B in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Y_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_4IJ8_B in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_4IJ8_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
Problem resolved - mpi4py wasn't installed! So how to will be:
conda install --yes clusterutils mpi4py
Really neat once you know how to do it!
TODO: put this in docs.
Ensembler running on 4GPUs since last night - wasn’t able to make it work on >1 node - either Ensembler was throwing some exception (and I don’t know what because it really throws the list of your commands at you) or CUDA was throwing initialization errors. That was on a nodes=6:ppn=4,gpus=4:shared,mem=24GB
job - disappointing because it could have been done already vs 48 hours. Well it’s working so we’ll let it run, but I’d like to work out how to do higher capacity jobs in the future
I know this would take me ages to work out on my own, so hopefully someone could spare a few mins for a tutorial please.
How does one set up Ensembler to run with multiple GPUs? I have 23 models and need to equilibrate them all for 5ns each.
More detailed questions:
--gpupn
torefine_explicit
and everything else will happen automagically as far as Ensembler is concerned?Thanks!