OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
master节点8核16G腾讯云虚拟机,node节点20核80G腾讯云虚拟机带一张nvidia T4显卡。操作系统为ubuntu server 18.04
node节点安装如下安装docker、nvidia-docker2并开启vgpu,使用的镜像是latest,参数均为默认(尝试过修改参数但是结果一样)
大佬,K8S小白请教个问题,还请麻烦指导一下
1. Issue or feature description
启用vGPU之后pod内执行nvidia-smi报错Segmentation fault (core dumped)
2. Steps to reproduce the issue
master节点8核16G腾讯云虚拟机,node节点20核80G腾讯云虚拟机带一张nvidia T4显卡。操作系统为ubuntu server 18.04 node节点安装如下安装docker、nvidia-docker2并开启vgpu,使用的镜像是latest,参数均为默认(尝试过修改参数但是结果一样)
执行如下操作 进入pod内部执行nvidia-smi结果如下
在带显卡的宿主机上执行nvidia-smi是没有问题的 docker版本20.10 K8S版本1.19.0使用kubeadm安装,kubelet版本也是1.19.0 docker info结果如下,daemon.json也已经配置了runtime和default-runtime为nvidia