AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.36k stars 303 forks source link

gpu pods are in pending states despite of enough gpu resource #188

Closed mf-giwoong-lee closed 1 year ago

mf-giwoong-lee commented 1 year ago

I have a gpu machines which has 4gpus.

All gpus have same memory capacity (23GiB).

I run 5 gpu pods which use 1 gpu and 10GiB gpu memory.

But the k8s only launch 4 gpu pods in gpu0,1 and the remaining pod is in pending states.

Generally, the 5 gpu pods are launched in this machine due to gpushare-scheduler, but only 4 pods are launched.

Why this phenomenon is happened?