OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
DCGM exporter is not picking the pods that are using vGPU, making it hard to to track utilization of the pods.
is there any workaround to monitor GPU utilization with vGPU?
is there a way to get the mapping between the vGPU and the actual GPU IDs?
Hello and thanks for your issue, unfortunately we are not compatible with DCGM, but we have our own monitor system, please refer to this repo: https://github.com/4paradigm/k8s-vgpu-scheduler
DCGM exporter is not picking the pods that are using vGPU, making it hard to to track utilization of the pods. is there any workaround to monitor GPU utilization with vGPU? is there a way to get the mapping between the vGPU and the actual GPU IDs?