OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
1. Issue or feature description
在使用vgpu的过程中偶尔会出现Handle_remap not found handle的问题
2. Steps to reproduce the issue
偶尔会出现 这时候重建pod可以恢复正常 在pod容器中输入nvidia-smi会报错
宿主机输入nvidia-smi正常 同一台宿主机的pod输入nvidia-smi正常
3. Information to attach (optional if deemed irrelevant)
错误日志
error-in-container.log
宿主机 nvidia-smi -a nvidia-smi-host.txt