OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
Apache License 2.0
489
stars
93
forks
source link
core dump when request 2 or more gpus with Tesla T4 #24
1. Issue or feature description
It's ok when request 1 gpu in yaml. But when request more than 1, the output of nvidia-smi is below: The output of nvidia-smi in host machine is ok.
In another machine with GeForce RTX 2070 SUPER ,it's all right when request 2 gpus. but when I run application locally , it abort due to :
2. Steps to reproduce the issue
ubuntu1~20.04 + microk8s + Tesla T4 GPU + 510driver
3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
) -{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }Additional information that might help better understand your environment and reproduce the bug:
dmesg