AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.39k stars 308 forks source link

How to modify scheduler configuration when I start k8s with binary system #142

Open huangleilz opened 3 years ago

huangleilz commented 3 years ago

使用二进制的方式部署的kubernate,没有kube-scheduler.yaml的配置文件。要怎么修改kube-scheduler的配置呢? 我在kube-scheduler.service的配置文件中添加了--policy-config-file=/etc/kubernetes/scheduler-policy-config.json配置. 将extender和plugin部署后,应用pod显示device error: no-gpu-has-2MiB-to-run

wenxinax commented 3 years ago

修改kube-scheduler.service中的启动参数是正确的。我刚刚解决这个unknown device的问题。这个问题的根源是scheduler没有和extender正确连接上。如果有多个master的话,检查一下修改的这个kube-scheduler有没有竞选为leader,或者把所有master的kube-scheduler改掉。或者可以排查一下是不是跟我一样的错误,就是scheduler-policy-config.json里的127.0.0.1:32766这个本地地址连不上extender,最后是换成了内网ip:32766解决的。

It is correct to modify the startup parameters in kube-scheduler.service. I just solved this unknown device problem. The root of this problem is that the scheduler is not properly connected to the extender. If there are multiple masters, check whether the modified kube-scheduler has been elected as the leader, or change the kube-scheduler of all masters. Or you can check if the error is the same as mine, that is, the local address of 127.0.0.1:32766 in scheduler-policy-config.json cannot connect to the extender. Finally, it was replaced by intranet ip:32766.