Limiting GPU Resource Usage per Docker Container with MPS Daemon

I’ve been utilizing the MPS (Multi-Process Service) daemon to manage resource usage limits for processes using the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT environment variables, and it’s been working well. However, I’ve encountered a scenario that I’m not sure how to address. I’m curious if there’s a way to apply these limits collectively to an entire Docker container.

For example, if we set CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=0=1000MB in the container’s environment variables, launching two processes results in each having its own limit, effectively allowing them to use a total of 2000MB combined. Is there a mechanism or strategy to enforce the total limit across the entire container so that, in my case, two applications together cannot exceed the 1000MB limit?

Has anyone tackled this issue before, or is there a way to ensure that the collective limit applies to the whole Docker container, restricting the total resource usage to, for example, 1000MB as per my example?

NVIDIA / k8s-device-plugin

Limiting GPU Resource Usage per Docker Container with MPS Daemon #594