Closed DiTo97 closed 3 months ago
Sorry for late response. AFAIK there is no generic sw tool for you to limit the amount of GPU memory allocatable per process. The closest thing is MPS or MIG (link) for partitioning a GPU in some ways. Using the stream-ordered memory allocator is another possibility, but all sw frameworks and libraries that you use must honor the driver mempool. I assume none of these is what you are asking for.
The linked material also discussed the consequence of oversubscribing a single device by multiple processes. Performance drop is an obvious possibility. Depending on the workload you might also experience deadlocks.
Hi all,
As the title suggests, is there a way to limit the total amount of memory that a process can allocate on a single CUDA device?
Perhaps, even by using pyNVML?
This issue is related to the following discussions:
What are the cons of sharing the resources of a single CUDA device among different processes competing for access?