awslabs / aws-virtual-gpu-device-plugin

AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads
https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/
Apache License 2.0
203 stars 31 forks source link

Are there any plans to support CUDA_MPS_PINNED_DEVICE_MEM_LIMIT? #23

Closed t-ibayashi-safie closed 1 year ago

t-ibayashi-safie commented 2 years ago

Present Status

I understand the current system configuration as follows:

My Suggestion

ghokun commented 2 years ago

pluginapi.AllocateRequest does not container information about current pod/container so I think it is not trivial to add CUDA_MPS_PINNED_DEVICE_MEM_LIMIT env variable to this plugin.

Limitations:

I am working on a fork to support it though (hopefully)

Currently I have this:

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-device-query
spec:
  hostIPC: true
  containers:
    - name: nvidia-device-query
      image: ghcr.io/kuartis/nvidia-device-query:1.0.0
      command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
      env:
        - name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
          value: 0=2G
      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
      volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
  volumes:
    - name: nvidia-mps
      hostPath:
        path: /tmp/nvidia-mps

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

      resources:
        limits:
          k8s.kuartis.com/vgpu: '1'
          k8s.kuartis.com/vgpu-mem: '1024' # This will set correct env variable for container

Here is the link: https://github.com/kuartis/kuartis-virtual-gpu-device-plugin

t-ibayashi-safie commented 2 years ago

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

ghokun commented 2 years ago

Thank you for providing an answer to my question.

If the cuda version of each pod is 11.5 or higher, your repository can limit the memory without relying on tensorflow, right?

What I plan is to create a new resource definition inside same plugin and make both Allocate methods to talk each other via channels.

Amazing. I'm looking forward to using this :)

Yes, It does limit the memory usage of the container. It even OOMs if you give low amounts.