DragonHPC / dragon

Dragon distributed runtime for HPC and AI applications and workflows
http://dragonhpc.org
MIT License
43 stars 6 forks source link

Usage of multiple GPUs #16

Open RINO-GAELICO opened 5 days ago

RINO-GAELICO commented 5 days ago

Hi, I tried to run a python code that would use multiple GPUs. It is a Pytorch inference model that loads on a Gpu to classify an image. I used the mp.Pool(NUM_FUNCTIONS), but each process uses the same GPU, although there are 4 available on one node.

Is there a way to specify the number of available resources?

with mp.Pool(NUMBER_OF_FUNCTIONS) as pool: results = pool.starmap(infer_image, tasks)

colinpwahl commented 5 days ago

You’re going to want to either define the device like device = torch.device(“cuda", <some visible device int>) or set the default device for each pool worker. The most straightforward way to set either of those would be to use multiprocessing pool’s initializer and initargs to do this once on start up of a process. This example shows how you could use a dictionary of queues to make sure that each process on a node gets its own visible device. Just a note about using the initializer, if you choose to define the device like device = torch.device(“cuda", <some visible device int>) you’ll want to define device as a global so you can reference it in the main work function. This won’t be a concern if you set the default device as I believe it will set the context for the remaining duration of the process.

RINO-GAELICO commented 4 days ago

What exactly do you mean by setting the default device for each pool worker? If my understanding is correct, all processes are defaulting to the same GPU leading to the observed behavior where all processes use the same GPU instead of utilizing different ones.

I'd rather not use a global variable or specify a "GPU_ID". I am working on a single node at the moment and if I don't specify anything all the processes are using the same GPU. I'd like the processes to be able to utilize all 4 gpus available instead of defaulting to the same one, without having to specify which gpu each process has to use. Is there a solution that allow for this?

colinpwahl commented 4 days ago

Yes, you are correct and your observation matches what was stated in this old post on the PyTorch Forum. What you’re observing is the default behavior of PyTorch. If you want PyTorch to utilize more than just the one GPU that is default to all of the pool workers you need to specify which device each process should be executing it's calls on. How you accomplish this depends on your specific needs but what I suggested before was a couple ways if you want each process to have a unique device.

RINO-GAELICO commented 4 days ago

Thank you for the reference. I'm investigating how NERSC supports FaaS-like use cases on HPC platforms. The actual use case for NERSC HPC users might not be PyTorch specifically, but rather loading a whole bunch of Python libraries, performing some computations, and obtaining results. This is what we are really trying to test.

In this regard, I was wondering what the real advantage of using mp.Pool is instead of just executing a series of functions in a loop.

colinpwahl commented 3 days ago

Ahh, okay. Sorry, I was constraining my answers to just your specific use of pool and wasn’t giving more general information.

I think that ProcessGroup might be a better fit for the use case you described. You can use the Policy class to control the placement of processes as well as set CPU and GPU affinity. The best set of examples of using policies with process groups can be found here. If you set the GPU affinity that sets the CUDA_VISIBLE_DEVICES if you’re using Nvidia GPUs. I believe most applications, including PyTorch, pull default devices from that so I think it would solve the more general problem that you’re trying to solve.

The Dragon implementation of multiprocessing pool uses ProcessGroup to manage all the processes that are a part of the pool. We haven’t fully patched through the use of policies yet but we do plan to for the native Pool implementation. Does the pool API meet all of your requirements for FaaS? If it’s a perfect fit then we can talk about making those changes to pool to be able to use policies with it. If it isn’t perfect then I’d consider writing a FaaS API using ProcessGroup but looking at the implementation of pool for inspiration on how to manage the processes.

On the last point, in my opinion, pool excels at solving embarrassingly parallel tasks that may take vastly different amounts of time. Compared to running in a for loop you have a lot more concurrency.