AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.4k stars 308 forks source link

Fail to create kube-scheduler #57

Open ylhsiehitri opened 5 years ago

ylhsiehitri commented 5 years ago

Hi,

I tried to create the kube-scheduler with kubectl create -f https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/config/kube-scheduler.yaml, but failed.

$ kubectl get pods -n kube-system
NAME                                     READY   STATUS
kube-scheduler                           0/1     CrashLoopBackOff
kube-scheduler-leadtek-gs4820            1/1     Running
gpushare-device-plugin-ds-hs4kt          1/1     Running
gpushare-schd-extender-978bd945b-sqhzj   1/1     Running
...
$ kubectl logs -n kube-system kube-scheduler
failed to create listener: failed to listen on 127.0.0.1:10251: listen tcp 127.0.0.1:10251: bind: address already in use

Even if I remove the livenessProbe section in the aforementioned kube-scheduler.yaml, the kubectl logs still shows the same error.

What's going wrong...?

Thanks!

ylhsiehitri commented 5 years ago

It seems I misunderstood... According to the section 2 in installation guide [1], it means to add those things in 2.1 and 2.2 to the default kube-scheduler and replace the default kube-scheduler, right?

If yes, I tried to modify /etc/kubernetes/manifests/kube-scheduler.yaml (then "$kubectl get po -n kube-system" will see the pod "kube-scheduler-leadtek-gs4820" restarts. Here "leadtek-gs4820" is the node name), but then I examine the running config (by "kubectl edit po -n kube-system kube-scheduler-leadtek-gs4820"), there's no update.

[1] https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

Vae1997 commented 4 years ago

Hello, it is indeed to modify the default kube-scheduler file, but I deployed the GPU shared pod after the configuration is completed, found no effect, then I tried to restart the machine, to achieve the desired effect

ide8 commented 4 years ago

@Vae1997 , why can't we create one more scheduler instead of modifying the default one?

Vae1997 commented 4 years ago

@ide8 Hello, I am just a beginner, the specific principle is not very clear, so I can only say my thoughts:

If you modify the default scheduler, k8s should be re-introduced by internal related mechanisms without affecting the overall environment of the cluster. Deploy the modified scheduler so that your changes take effect.

Conversely, if you deploy a scheduler yourself, you may conflict with the default scheduling mechanism. On the other hand, your changes will not take effect. The worst case is that the original scheduling mechanism of the cluster will not work.

Of course, there should be a way to delete the default scheduler and redeploy as needed. It should be the same as when building a cluster with binary files. (But I haven't tried to build a k8s cluster from a binary file. It's not clear if this operation will affect the existing cluster.)

ylhsiehitri commented 4 years ago

Hello, it is indeed to modify the default kube-scheduler file, but I deployed the GPU shared pod after the configuration is completed, found no effect, then I tried to restart the machine, to achieve the desired effect

Similarly to @Vae1997 that restarting the machine, for some reason I happened to re-install OS, then the problem just disappeared...