choderalab / openmmtools

A batteries-included toolkit for the GPU-accelerated OpenMM molecular simulation engine.
http://openmmtools.readthedocs.io
MIT License
244 stars 76 forks source link

Multiple GPUs #726

Open gitkol opened 5 months ago

gitkol commented 5 months ago

Hi,

Can rest.py run on multiple GPUs?

Thanks,

Istvan

ijpulidos commented 5 months ago

Hello! What do you mean by rest.py? Can you be more specific as to what your issue is?

gitkol commented 5 months ago

I am sorry, I mean the openmmtools script for replica exchange solute tempering. It works very well, but since every replica is running on the same GPU, it is slow. I was wondering if workstations with more than one GPU could be used to distribute the replicas. It is easy to run OpenMM jobs in such manner by specifying multiple device indices for the simulation platform, but I couldn’t find a similar option within openmmtools. Hope this makes my question more clear. Thank you very much, Istvan

On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido @.***> wrote:

Hello! What do you mean by rest.py? Can you be more specific as to what your issue is?

— Reply to this email directly, view it on GitHub https://github.com/choderalab/openmmtools/issues/726#issuecomment-2008482133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM . You are receiving this because you authored the thread.Message ID: @.***>

gitkol commented 5 months ago

I posted this question first on the OpenMM cookbook GitHub but Peter Eastman said that it belonged here.

On Tue, Mar 19, 2024 at 9:34 PM Istvan Kolossvary @.***> wrote:

I am sorry, I mean the openmmtools script for replica exchange solute tempering. It works very well, but since every replica is running on the same GPU, it is slow. I was wondering if workstations with more than one GPU could be used to distribute the replicas. It is easy to run OpenMM jobs in such manner by specifying multiple device indices for the simulation platform, but I couldn’t find a similar option within openmmtools. Hope this makes my question more clear. Thank you very much, Istvan

On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido @.***> wrote:

Hello! What do you mean by rest.py? Can you be more specific as to what your issue is?

— Reply to this email directly, view it on GitHub https://github.com/choderalab/openmmtools/issues/726#issuecomment-2008482133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM . You are receiving this because you authored the thread.Message ID: @.***>

xiaowei-xie2 commented 3 months ago

I have the same question. I am able to run multiple solute tempering REMD simulations in parallel with mpirun according to this issue(https://github.com/choderalab/openmmtools/issues/648), but I don't know how to distribute replicas among multiple GPUs so that they contribute to the same REMD simulation.

gitkol commented 3 months ago

Yes, that was exactly my point. When I run such a simulation using openmmtools, the only option seems to be that all, say, 8 jobs run on the same GPU and that makes it prohibitively slow. I have not received any reply from the developers. (Plain OpenMM jobs can readily run on multiple GPUs.)

On Fri, May 17, 2024 at 8:39 PM Xiaowei Xie @.***> wrote:

I have the same question. I am able to run multiple solute tempering REMD simulations in parallel with mpirun according to this issue(#648 https://github.com/choderalab/openmmtools/issues/648), but I don't know how to distribute replicas among multiple GPUs so that they contribute to the same REMD simulation.

— Reply to this email directly, view it on GitHub https://github.com/choderalab/openmmtools/issues/726#issuecomment-2118530480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDNJ2URGTQAN23ADOYXRSTZC2PK5AVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGUZTANBYGA . You are receiving this because you authored the thread.Message ID: @.***>

xiaowei-xie2 commented 3 months ago

Hi @gitkol, I think I figured it out. Do you have mpi4py installed correctly? For me, what I found was that, if I don't have mpi4py installed, mpirun would run multiple copies of the same REMD (each GPU would run a whole REMD but multiple GPUs running at the same time), but when I have mpi4py installed multiple GPUs will contribute to a single REMD. Here is an example of job files that worked for me (if it's helpful). On my system, using 4GPUs resulted in 2x speed up compared to 1 GPU (not 4x). test_rest_14.tar.gz

ijpulidos commented 3 months ago

@xiaowei-xie2 is correct, having mpi4py is important in this case. Thank you for providing a test script that we can use to reproduce your results.

There's always some part of the code that cannot be fully parallelized, for example when communicating between the different GPUs. It would be interesting to see if we can accomplish some profiling to check where the overhead is. Thanks!

xiaowei-xie2 commented 3 months ago

Hi @ijpulidos, thank you for the insight. Yes I totally understand using n GPUs won't necessarily result in n times speed up (sometimes not even any speed up), so I am actually satisfied with the current performance. But yes it would be nice to see where the overhead is!

I am also curious does the current repo support parallelizing across multiple GPUs across multiple nodes?

ijpulidos commented 3 months ago

@xiaowei-xie2 It does support that, since everything is handled by the MPI environment. That also means it's highly dependent on the MPI setup of the system. Depending on the connectivity of your HPC system and the system being simulated it might make sense, or not, to do this.

We should try to come up with an example on how to accomplish this that people can use and add it to the docs.