Open rena-ganba opened 5 years ago
Hello, I also encountered a similar problem when I created the resource.
I have three nodes here. At present, I only configure the GPU at a node (install driver, nvidia-docker, etc.), through the describe command prompt:
0 /3 nodes are available: 1 Insufficient GPU Memory in one device, 2 Insufficient aliyun.com/gpu-mem
To compare your problem, first make sure to follow Prerequisites .
After the deployment is complete, you should display the configuration via "kubectl inspect gpushare". Secondly, for the problem itself, when I noticed the "default-scheduler" field, I guess that the previously deployed gpushare-scheduler-extender, kube-scheduler.yaml modified according to the user guide has not yet taken effect, will use the default scheduler. When I looked at the kubelet service, I found the error: unknown container "/system.slice/docker.service"
, I am just a beginner, not sure if it has a relationship, reference this ,Kubelet works fine, finally, I reboot the machine, now everything works fine
Hi,
I try to do the example and I got this error
`Events: Type Reason Age From Message
Warning FailedScheduling 6s (x5 over 3m1s) default-scheduler 0/2 nodes are available: 2 Insufficient aliyun.com/gpu-mem`
What does it mean?