4paradigm / k8s-vgpu-scheduler

OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
Apache License 2.0
489 stars 93 forks source link

Is there a way to monitor vGPU with DCGM? #18

Open rjanovski opened 2 years ago

rjanovski commented 2 years ago

DCGM exporter is not picking the pods that are using vGPU, making it hard to to track utilization of the pods. is there any workaround to monitor GPU utilization with vGPU? is there a way to get the mapping between the vGPU and the actual GPU IDs?

archlitchi commented 2 years ago

Hello and thanks for your issue, unfortunately we are not compatible with DCGM, but we have our own monitor system, please refer to this repo: https://github.com/4paradigm/k8s-vgpu-scheduler