AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.4k stars 308 forks source link

Failed to create GPU deployment #59

Open rena-ganba opened 5 years ago

rena-ganba commented 5 years ago

Hi,

I try to do the example and I got this error

`Events: Type Reason Age From Message


Warning FailedScheduling 6s (x5 over 3m1s) default-scheduler 0/2 nodes are available: 2 Insufficient aliyun.com/gpu-mem`

What does it mean?

Vae1997 commented 5 years ago

Hello, I also encountered a similar problem when I created the resource. I have three nodes here. At present, I only configure the GPU at a node (install driver, nvidia-docker, etc.), through the describe command prompt: 0 /3 nodes are available: 1 Insufficient GPU Memory in one device, 2 Insufficient aliyun.com/gpu-mem To compare your problem, first make sure to follow Prerequisites . After the deployment is complete, you should display the configuration via "kubectl inspect gpushare". Secondly, for the problem itself, when I noticed the "default-scheduler" field, I guess that the previously deployed gpushare-scheduler-extender, kube-scheduler.yaml modified according to the user guide has not yet taken effect, will use the default scheduler. When I looked at the kubelet service, I found the error: unknown container "/system.slice/docker.service" , I am just a beginner, not sure if it has a relationship, reference this ,Kubelet works fine, finally, I reboot the machine, now everything works fine