OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical capacity. It is designed for ease of use of extended device memory for AI workloads.
1. Issue or feature description
当我使用示例进行实验时,报错Segmentation fault (core dumped)。 卡片种类为NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
2. Steps to reproduce the issue
1、修改https://raw.githubusercontent.com/4paradigm/k8s-device-plugin/master/nvidia-device-plugin.yml文件, "--device-split-count=3", "--device-memory-scaling=1", "--device-cores-scaling=1"
2、kubectl apply -f nvidia-device-plugin.yml
3、部署 apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: containers:
[root@node1 4p]# kubectl exec -it gpu-pod /bin/sh
nvidia-smi
Segmentation fault (core dumped)
是gpu需要做额外设置吗? 还是因为操作系统本身是centos76引起的?
3 尝试
我不知道是不是更深层次的原因例如so文件在处理1个pod分配2vgpu 有些问题导致的。 但在设备插件层面这样修改,能够解决。