AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.39k stars 308 forks source link

使用aliyun.com/gpu-mem资源调度Pod失败 http://127.0.0.1:32766/gpushare-scheduler/filter context deadline exceeded #128

Open zhangshuiyong opened 3 years ago

zhangshuiyong commented 3 years ago

kubectl describe pod $pod

Events: Type Reason Age From Message


Warning FailedScheduling 64s Post "http://127.0.0.1:32766/gpushare-scheduler/filter": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning FailedScheduling 58s Post "http://127.0.0.1:32766/gpushare-scheduler/filter": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

heumsi commented 2 years ago

Same too. Are there any solutions ..?

heumsi commented 2 years ago

I found my problems. coredns deployed on node which scheduler deployed was not running well. After I deployed coredns well, This problem resolved.

wanghaowish commented 1 year ago

现在有解决办法了吗?我部署完成之后调度也提示无法调度 Post "http://127.0.0.1:32766/gpushare-scheduler/filter": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

wanghaowish commented 1 year ago

找到我自己的解决办法了,是他的scheduler-policy-config.yaml里的extenders.urlPrefix的问题,我把urlPrefix: "http://127.0.0.1:32766/gpushare-scheduler" 修改为 urlPrefix: "http://<gpushare-schd-extender-svc-clusterip>:32766/gpushare-scheduler" 就可以了