Closed echo3215987 closed 2 years ago
I fixed the issue.
When I create a job for test without gpu.
The key sentence of the describe pod info is default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
So I execute:
kubectl taint nodes tpeaa03-precision-7920-tower node -role.kubernetes.io/master-
It works and shows gpushare-device-plugin.
gpushare-device-plugin-ds-hm52c
When I execute cmd "kubectl create -f device-plugin-ds.yaml", Daemon don't create device-plugin-ds pod
list all of kube-system pod
list /var/lib/kubelet/device-plugins, no aliyungpushare.sock file
k8s version(1.23.0)
If anyone have the same issue and fix it ? Could give me something clue? Thanks.