[Question][Feature] Using multiple GPU in the LocalGPUQueue for single simulation

Acellera / htmd

HTMD: Programming Environment for Molecular Discovery

https://software.acellera.com/docs/latest/htmd/index.html

Other

261 stars 59 forks source link

[Question][Feature] Using multiple GPU in the LocalGPUQueue for single simulation #1009

Closed DanielWicz closed 2 years ago

DanielWicz commented 3 years ago

Hello, it is possible to use more than 1 GPU for single simulation using in the LocalGPUQueue in the AdaptiveMD/AdaptiveBandit protocol ?

stefdoerr commented 3 years ago

Ah sorry I misread the issue. No, a single simulation will (nearly) always run on a single GPU. ACEMD3 does not perform much better with multiple GPUs. It is mostly used if your simulation does not fit into the memory of a single GPU.

DanielWicz commented 3 years ago

Ah sorry I misread the issue. No, a single simulation will (nearly) always run on a single GPU. ACEMD3 does not perform much better with multiple GPUs. It is mostly used if your simulation does not fit into the memory of a single GPU.

I forgot to specify the MD engine - at the moment I use OpenMM and alone it works perfectly fine with number of gpus above 1. But if I'm right, HTMD does create a new process, where the number of GPUs (devices) is limited to 1 ? Fallowing the assumption, there is possibility to set the number of GPUs per spawned process to something above 1 ?

stefdoerr commented 3 years ago

Yes, the jobqueues library writes a job.sh file which at the top exports CUDA_VISIBLE_DEVICES to the specific single GPU. So OpenMM will not be able to see other GPUs other than the assigned one. I'm curious, can you show me with the speedup you have by using two GPUs instead of one with OpenMM? We have not seen any significant speedup at least to the point where it's more worth than running two parallel simulations instead.

DanielWicz commented 3 years ago

Yes, the jobqueues library writes a job.sh file which at the top exports CUDA_VISIBLE_DEVICES to the specific single GPU. So OpenMM will not be able to see other GPUs other than the assigned one. I'm curious, can you show me with the speedup you have by using two GPUs instead of one with OpenMM? We have not seen any significant speedup at least to the point where it's more worth than running two parallel simulations instead.

Generally it heavily depends on the system - bigger system, the bigger is speed up. Of course the speed is slower than using sum of N single gpu - single instance simulations. But such approach is not always applicable (e.g. when you have only one starting structure or you want just one long MD simulation).

Here is a quick comparison that I made for a small-medium sized system