Are there any plans to support CUDA_MPS_PINNED_DEVICE_MEM_LIMIT?

t-ibayashi-safie commented 2 years ago

Present Status

I understand the current system configuration as follows:

Currently, the amount of GPU threads used by Pod seems to be controlled by CUDA_MPS_ACTIVE_THREAD_PERCENTAGE.
And the memory usage used by the pod is not limited by this plugin.
Therefore, we need to pass the per_process_gpu_memory_fraction argument to tensorflow to control memory usage

My Suggestion

A new environment variable CUDA_MPS_PINNED_DEVICE_MEM_LIMIT has been added to cuda 11.5.
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT can control the memory usage of each MPS clients.
I think it will be more convenient if you adapt this plugin to this new feature.
Because:
- It don't have to rely on Tensorflow.
- It will be able to use other GPU libraries.

ghokun commented 2 years ago

pluginapi.AllocateRequest does not container information about current pod/container so I think it is not trivial to add CUDA_MPS_PINNED_DEVICE_MEM_LIMIT env variable to this plugin.

Limitations:

Custom resources can be integers only.
No information / annotation / labels about allocated container/pod inside device plugin. :(

I am working on a fork to support it though (hopefully)

Currently I have this:

Added CUDA_MPS_ACTIVE_THREAD_PERCENTAGE on client level so each container can have different amount of SM units.
Added CUDA_MPS_PINNED_DEVICE_MEM_LIMIT as env variable to container to limit GPU memory usage.
Metrics for each container about memory usage.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-device-query
spec:
  hostIPC: true
  containers:
    - name: nvidia-device-query
      image: ghcr.io/kuartis/nvidia-device-query:1.0.0
      command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      env:
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
          value: 0=2G
      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
      volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
  volumes:
    - name: nvidia-mps
      hostPath:
        path: /tmp/nvidia-mps

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
          k8s.kuartis.com/vgpu-mem: '1024' # This will set correct env variable for container

Here is the link: https://github.com/kuartis/kuartis-virtual-gpu-device-plugin

t-ibayashi-safie commented 2 years ago

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

ghokun commented 2 years ago

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

Yes, It does limit the memory usage of the container. It even OOMs if you give low amounts.

awslabs / aws-virtual-gpu-device-plugin

Are there any plans to support CUDA_MPS_PINNED_DEVICE_MEM_LIMIT? #23

Present Status

My Suggestion