run yank on specific gpu?

sawstory commented 4 years ago

Hi , i would like run multiple yank at the same time for different protein/ligand each on different gpu. i have 6 gtx 1080 on my machine but i dont know how to modify yaml code to do that, i mean to specify the gpu to work on for the single run. Thanks Omar

Lnaden commented 4 years ago

YANK itself has no way to actively assign GPUs. However, you can mask GPU's from the CUDA side by setting the CUDA_VISIBLE_DEVICES environment variable.

This link has some helpful info about it, you could also invoke yank's cli with a precursor env variable like

CUDA_VISIBLE_DEVICES=1 yank ...`

where ... is the rest of the command.

Let me know if this works!

sawstory commented 4 years ago

so there is no option like what i use in openmm py file! where i just modify the CudaDeviceIndex number in each simultaneous run (from 0 to 5 in my case if i would like to do 6 runs at the same time) like below i assign gpu number 2 platform = Platform.getPlatformByName('CUDA') properties = {'CudaPrecision': 'single', 'CudaDeviceIndex': '2'}

Lnaden commented 4 years ago

OpenMM does expose that option, but YANK, and the main sampler suite MultistateSampler in OpenMMTools does not because its possible to have multiple OpenMM contexts in YANK which may be distributed over different GPUs. As far as I know, the only way to have YANK choose the GPU is through Nvidia's environment variable CUDA_VISIBLE_DEVICES.

With some code modifications, it might be possible to add an option to lock the GPU, but it would take some time.

sawstory commented 4 years ago

Thanks for the fast reply :) , can you please let me know if i can distribute the load on the 6 gpus? i can not find direct answer in documentation , i hope if there is line code i can include in the yaml file that can do that. all the best Omar

Lnaden commented 4 years ago

can you please let me know if i can distribute the load on the 6 gpus?

Can you elaborate on this a bit? Based on your initial post...

i would like run multiple yank at the same time for different protein/ligand each on different gpu

I understand this to mean "You have multiple YANK simulations you want to run, you would like to run each one on a different GPU." If that is not correct, please let me know.

Secondly, there is the mater of what type of YANK sampler you are using. Are you using a ReplicaExchangeSampler where there are multiple OpenMM contexts per run, or a SAMSSampler where there is a single OpenMM context per run? The later is somewhat easier to distribute by hand, the former is a bit harder.

Thanks

Lnaden commented 4 years ago

can you please let me know if i can distribute the load on the 6 gpus?

Can you elaborate on this a bit? Based on your initial post...

i would like run multiple yank at the same time for different protein/ligand each on different gpu

I understand this to mean "You have multiple YANK simulations you want to run, you would like to run each one on a different GPU." If that is not correct, please let me know.

Secondly, there is the mater of what type of YANK sampler you are using. Are you using a ReplicaExchangeSampler where there are multiple OpenMM contexts per run, or a SAMSSampler where there is a single OpenMM context per run? The later is somewhat easier to distribute by hand, the former is a bit harder.

Thanks

sawstory commented 4 years ago

''can you please let me know if i can distribute the load on the 6 gpus?'' i mean when i run yank it only use one gpu, so i wonder if the load can be on more than one gpu to make it faster.

**''I understand this to mean "You have multiple YANK simulations you want to run, you would like to run each one on a different GPU." If that is not correct, please let me know.''** thats right but my last comment i was asking about new thing compared to my first comment

''Secondly, there is the mater of what type of YANK sampler you are using. Are you using a ReplicaExchangeSampler where there are multiple OpenMM contexts per run, or a SAMSSampler where there is a single OpenMM context per run? The later is somewhat easier to distribute by hand, the former is a bit harder.'' actually i dont know , i just try to use yank so i can get good estimation for the affinity so whatever do that i will use it. i use yaml file similar to the one used in the t4 example , its attched here yamlt-test.txt

Lnaden commented 4 years ago

i mean when i run yank it only use one gpu, so i wonder if the load can be on more than one gpu to make it faster.

That does not have a simple answer. If you do nothing (and just run with defaults), then YANK will choose the Hamiltonian Replica Exchange (HREX) sampler (I'm pretty sure at least), so there will be N contexts created equal to the number of thermodynamic states. If you just run with yank ... as your command, then YANK operates in serial mode, so it does not take advantage of a multi-GPU system, and the states are propagated in serial, which will be slow.

If you want YANK to run multiple contexts in parallel with the HREX sampling scheme, you have to configure MPI to run the contexts on the same or different GPU. We have information about this here:

http://getyank.org/latest/running.html#parallelization

However, the tools we provide for auto-creating the MPI config files for multi-GPU systems (clusterutils) assume you are running on a queue managed cluster (PBS, SLURM, LSF). If this is a personal machine, you will likely have to write the config file yourself (there are examples of the config file and the hostfile on that page as well).

As for actual load distribution, If you are running in MPI mode, I personally have found there is not too much of a slow down if all the contexts wind up on one card or are evenly distributed, so long as you don't exceed the memory limit of the GPU (about 10% overhead for all on 1 card or distributed). The slow speed you're seeing I suspect to running in serial mode, which I discussed above, so I'm not sure distribution is the main problem here (based on what you have said).

Now that I've gone over the complex part, there is an easier solution, but you will have to run the simulations for longer simulated time in most cases. There is a different sampling scheme called SAMS which uses a single replica sampler rather than the multi-replica sampler from HREX. This only requires a single context to yield the same estimate, with caveats of course. On one hand, you have to simulate long enough for weights of the sampled states to converge, then you have to keep sampling to actually generate enough data for a good free energy estimate. On the other hand, the wall clock time it takes to do this is usually equal or faster than running the HREX sampler because you only have to propagate a single replica, not N of them.

In this sampler, there is no need to have distribution of contexts to multiple GPUs in a single experiment because there is only the one context to distribute, which I think addresses your second question. For the multi-experiment case where you wanted to have concurrent YANK runs, you can start each YANK simulation with different CUDA_VISIBLE_DEVICES set to use a different card for each run, which I think addresses your first question.

You'll need to look at the algorithm and theory a bit to see if this is what you want to use for your simulations, but its something I at least would recommend. @andrrizzi might have other thoughts on this.

If you choose to use SAMS, here is how you would modify your yaml file to include it:

http://getyank.org/latest/yamlpages/samplers.html#samssampler-options

sawstory commented 4 years ago

Thanks for the detailed answer, i need some time to digest that heavy meal of info :) and will back to you when I decide what I will do

Lnaden commented 4 years ago

Sure thing, I know that was a bit of an info dump, but I wanted to give you everything at once instead of in pieces and maybe have missed something along the way.

Good luck and feel free to ask other questions, even if I can't answer them, others can!

choderalab / yank

run yank on specific gpu? #1204