Open ak47947 opened 1 week ago
have you uninstalled nvidia-k8s-device-plugin before installing HAMi?
have you uninstalled nvidia-k8s-device-plugin before installing HAMi?
已经安装了的,是否需要卸载
我通过helm uninstall hami -n kube-system 卸载后重装hami解决了,现在可以看到GPU信息了
进入容器也可以看到隔离信息了
发现一个新的问题,这个问题可能是因为开关机为主机增加和删除新的显卡引起的,在增加和删除显卡后,hami会失效
What happened: 使用GPU Operator安装Kubernetes GPU 环境搭建,然后安装HAMi插件,服务安装正常,但是GPU数量还是显示1,在容器中也未切分
What you expected to happen: GPU数量显示10份(默认),容器中资源得到限制
How to reproduce it (as minimally and precisely as possible): 使用GPU Operator安装Kubernetes GPU 环境
Anything else we need to know?:
安装后服务正常
GPU没有切分
The output of
nvidia-smi -a
on your hostYour docker or containerd configuration file (e.g:
/etc/docker/daemon.json
) 配置无问题The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g:
sudo journalctl -r -u kubelet
)Any relevant kernel output lines from
dmesg
Environment:
docker version
: Docker version 20.10.24, build 297e128uname -a
: ubuntu 5.15.0-124-generic