gpu pod FailedScheduling

I hope you can give me some suggestions. Thank you

What happened: gpu pod FailedScheduling

What you expected to happen: the status of gpu pod is running How to reproduce it (as minimally and precisely as possible): install the hami according to the install steps, then run the following deployment: creat gpu-pod01.yaml

kubectl apply -f gpu-pod01.yaml

kubectl describe pod gpu-pod01

if I don't request gpu mem: creat gpu-pod02.yaml

kubectl describe pod gpu-pod01

Anything else we need to know?:

The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json) Master /etc/docker/daemon.json

node1 /etc/docker/daemon.json
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg

Environment:

HAMi version: 2.4.1
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others: kubectl logs kube-apiserver-k8s-master -n kube-system
curl scheduler node ip：31993/metrics

Project-HAMi / HAMi

gpu pod FailedScheduling #629