How to set up Ensembler-mpi-multiple GPUs?

rafwiewiora commented 8 years ago

I know this would take me ages to work out on my own, so hopefully someone could spare a few mins for a tutorial please.

How does one set up Ensembler to run with multiple GPUs? I have 23 models and need to equilibrate them all for 5ns each.

More detailed questions:

should I use GPUs in exclusive mode or in shared? If in shared - is it possible to run >1 simulations on the same chip at the same time? --> Stemming from this - how many GPUs do I need, so I don't just take 23 and waste them (in addition to waiting a while for that job to start I guess)
do I just pass --gpupn to refine_explicit and everything else will happen automagically as far as Ensembler is concerned?
what should the batch script for HAL look like?

Thanks!

rafwiewiora commented 8 years ago

Ok I did a trial MPI run and it's not all quite right :/

I did 2 CPU threads and 2 GPUs:shared. Log is here - you can see it appears like there are two processes running everything in parallel, rather than dividing work in half - there's even a time where one deletes a file, and the other one complains when it tries the same next. Log: https://gist.github.com/rafwiewiora/b7f663ee76059ea8aca22b42347cafce (killed it at the start of explicit refinement - hanged).

I did:

Ensembler script:

# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0
ensembler refine_implicit --gpupn 4
ensembler solvate
ensembler refine_explicit --gpupn 4

MPI script:

# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
#   ppn: number of processes per node
#  gpus: number of gpus per node
#  GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option 
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput

# Change to working directory used for job submission
cd $PBS_O_WORKDIR

# add anaconda to PATH
# export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH

# Set CUDA_VISIBLE_DEVICES for this process
python build-mpirun-configfile.py bash ensembler_script.sh

# Launch MPI job.
mpirun -configfile configfile

This uses the build-mpirun-configfile.py from https://github.com/choderalab/clusterutils/blob/master/clusterutils/build_mpirun_configfile.py

Any pointers?

rafwiewiora commented 8 years ago

Actually had --gpupn 4 's in Ensembler script, but only 2 GPUs on the job. Let me see what happens when I get that right.

rafwiewiora commented 8 years ago

Yep, same thing.

jchodera commented 8 years ago

You can't use mpirun to launch a shell script. You have to launch the executable directly. Change your qsub script to this:

# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
#   ppn: number of processes per node
#  gpus: number of gpus per node
#  GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option 
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput

# Change to working directory used for job submission
cd $PBS_O_WORKDIR

# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0

# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_implicit --gpupn 4
mpirun -configfile configfile

ensembler solvate

# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_explicit --gpupn 4
mpirun -configfile configfile

This parallelizes the two refinement steps.

Hopefully @danielparton can jump in if I've made a mistake.

jchodera commented 8 years ago

Also, you might as well grab 4 GPUs by changing

#PBS -l nodes=1:ppn=2:gpus=2:shared

to

#PBS -l nodes=1:ppn=4:gpus=4:shared

rafwiewiora commented 8 years ago

Oh I see, thanks @jchodera! An hour wait time for 4GPUs last time I checked, so testing with 2 for now. (still need to work out the manual PDB before this is all ready to run right)

rafwiewiora commented 8 years ago

Same thing - still running duplicates of the same thing at refine_implicit, this happens:

Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------

etc. And it seems like they're both on the same GPU?

jchodera commented 8 years ago

Can you post the complete queue submission script from this last attempt?

rafwiewiora commented 8 years ago

Sure thing:

# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
#   ppn: number of processes per node
#  gpus: number of gpus per node
#  GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option 
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput

# Change to working directory used for job submission
cd $PBS_O_WORKDIR

# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0

# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_implicit --gpupn 2
mpirun -configfile configfile

ensembler solvate

# Parallelize refinement
python build-mpirun-configfile.py ensembler refine_explicit --gpupn 2
mpirun -configfile configfile

and I'm doing qsub mpi_script

rafwiewiora commented 8 years ago

I should also mention that I've created .dontsetcudavisibledevices file in my home dir, as hal wiki suggests.

jchodera commented 8 years ago

conda install --yes clusterutils should install a script called build_mpirun_configfile. ensembler already installs this as a dependency, I think. I believe you should be using this script instead of This is python build-mpirun-configfile.py.

# walltime : maximum wall clock time (hh:mm:ss)
#PBS -l walltime=03:00:00
#
# join stdout and stderr
#PBS -j oe
#
# spool output immediately
#PBS -k oe
#
# specify GPU queue
#PBS -q gpu
#
# nodes: number of nodes
#   ppn: number of processes per node
#  gpus: number of gpus per node
#  GPUs are in 'exclusive' mode by default, but 'shared' keyword sets them to shared mode.
#PBS -l nodes=1:ppn=2:gpus=2:shared
#
# export all my environment variables to the job
#PBS -V
#
# job name (default = name of script file)
#PBS -N myjob
#
# mail settings (one or more characters)
# email is sent to local user, unless another email address is specified with PBS -M option 
# n: do not send mail
# a: send mail if job is aborted
# b: send mail when job begins execution
# e: send mail when job terminates
#PBS -m n
#
# filename for standard output (default = <job_name>.o<job_id>)
# at end of job, it is in directory from which qsub was executed
# remove extra ## from the line below if you want to name your own file
##PBS -o myoutput

# Change to working directory used for job submission
cd $PBS_O_WORKDIR

# add anaconda to PATH
export PATH=/cbio/jclab/home/rafal.wiewiora/anaconda/bin:$PATH
ensembler init
cp ../manual-overrides.yaml .
ensembler gather_targets --gather_from uniprot --query SETD8_HUMAN --uniprot_domain_regex SET
ensembler gather_templates --gather_from uniprot --query SETD8_HUMAN
# no loopmodel
ensembler align
ensembler build_models
ensembler cluster --cutoff 0

# Parallelize refinement
build_mpirun_configfile ensembler refine_implicit --gpupn 2
mpirun -configfile configfile

ensembler solvate

# Parallelize refinement
build_mpirun_configfile ensembler refine_explicit --gpupn 2
mpirun -configfile configfile

That worked for me:

[chodera@mskcc-ln1 ~/debug-ensembler]$ cat ~/ensembler.o7068007
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
gpu-2-7-gpu1
gpu-2-7-gpu0
-hosts gpu-2-7:1,gpu-2-7:1 -np 1 -env CUDA_VISIBLE_DEVICES 1 ensembler refine_implicit --gpupn 2 : -np 1 -env CUDA_VISIBLE_DEVICES 0 ensembler refine_implicit --gpupn 2Auto-selected OpenMM platform: CUDA
Auto-selected OpenMM platform: CUDA
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_1ZKK_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_2BQZ_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_2BQZ_E in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9W_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9X_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Y_B in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Y_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_C in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_D in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_3F9Z_B in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_4IJ8_B in implicit solvent for 100.0 ps (MPI rank: 1, GPU ID: 1)
-------------------------------------------------------------------------
-------------------------------------------------------------------------
Simulating SETD8_HUMAN_D0 => SETD8_HUMAN_4IJ8_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------

rafwiewiora commented 8 years ago

Problem resolved - mpi4py wasn't installed! So how to will be:

conda install --yes clusterutils mpi4py
use a qsub script as in the last version above

Really neat once you know how to do it!

TODO: put this in docs.

rafwiewiora commented 8 years ago

Ensembler running on 4GPUs since last night - wasn’t able to make it work on >1 node - either Ensembler was throwing some exception (and I don’t know what because it really throws the list of your commands at you) or CUDA was throwing initialization errors. That was on a nodes=6:ppn=4,gpus=4:shared,mem=24GB job - disappointing because it could have been done already vs 48 hours. Well it’s working so we’ll let it run, but I’d like to work out how to do higher capacity jobs in the future

choderalab / ensembler

How to set up Ensembler-mpi-multiple GPUs? #71