NVIDIA / cuda-python

CUDA Python Low-level Bindings
https://nvidia.github.io/cuda-python/
Other
859 stars 68 forks source link

limit the amount of memory a process can allocate on a single CUDA device #49

Closed DiTo97 closed 3 months ago

DiTo97 commented 1 year ago

Hi all,

As the title suggests, is there a way to limit the total amount of memory that a process can allocate on a single CUDA device?

Perhaps, even by using pyNVML?

This issue is related to the following discussions:

What are the cons of sharing the resources of a single CUDA device among different processes competing for access?

leofang commented 3 months ago

Sorry for late response. AFAIK there is no generic sw tool for you to limit the amount of GPU memory allocatable per process. The closest thing is MPS or MIG (link) for partitioning a GPU in some ways. Using the stream-ordered memory allocator is another possibility, but all sw frameworks and libraries that you use must honor the driver mempool. I assume none of these is what you are asking for.

The linked material also discussed the consequence of oversubscribing a single device by multiple processes. Performance drop is an obvious possibility. Depending on the workload you might also experience deadlocks.