gocrane / crane-scheduler

Crane scheduler is a Kubernetes scheduler which can schedule pod based on actual node load.
Apache License 2.0
224 stars 62 forks source link

binding rejected: running Bind plugin "DefaultBinder": Operation cannot be fulfilled on pods/binding #55

Open LiaoSirui opened 10 months ago

LiaoSirui commented 10 months ago

调度失败

I1121 03:06:27.345666 1 plugins.go:92] [crane] Node[dev-monitoring]'s finalscore is 69, while score is 69 and hotvalue is 0.000000
I1121 03:06:27.345752 1 plugins.go:92] [crane] Node[dev-qchen]'s finalscore is 81, while score is 81 and hotvalue is 0.000000
I1121 03:06:27.345751 1 plugins.go:92] [crane] Node[bqdev02]'s finalscore is 72, while score is 72 and hotvalue is 0.000000
I1121 03:06:27.345775 1 plugins.go:92] [crane] Node[bqdev01]'s finalscore is 85, while score is 85 and hotvalue is 0.000000
I1121 03:06:27.345780 1 plugins.go:92] [crane] Node[bqdev03]'s finalscore is 74, while score is 74 and hotvalue is 0.000000
I1121 03:06:27.345787 1 plugins.go:92] [crane] Node[dev-node4]'s finalscore is 67, while score is 67 and hotvalue is 0.000000
I1121 03:06:27.345797 1 plugins.go:92] [crane] Node[dev-xyli]'s finalscore is 67, while score is 67 and hotvalue is 0.000000
I1121 03:06:27.345790 1 plugins.go:92] [crane] Node[dev-master3]'s finalscore is 79, while score is 79 and hotvalue is 0.000000
I1121 03:06:27.345810 1 plugins.go:92] [crane] Node[dev-node5]'s finalscore is 83, while score is 83 and hotvalue is 0.000000
I1121 03:06:27.345821 1 plugins.go:92] [crane] Node[dev-whliao]'s finalscore is 73, while score is 73 and hotvalue is 0.000000
E1121 03:06:27.358217 1 framework.go:1000] "Failed running Bind plugin" err="Operation cannot be fulfilled on pods/binding \"cpu-stress-59f8597545-7bdrq\": pod cpu-stress-59f8597545-7bdrq is already assigned to node \"dev-node5\"" plugin="DefaultBinder" pod="crane-system/cpu-stress-59f8597545-7bdrq"
E1121 03:06:27.358235 1 scheduler.go:610] "scheduler cache ForgetPod failed" err="pod c2cae006-2ae2-4ca6-b2f6-6af43faaa972 wasn't assumed so cannot be forgotten"
E1121 03:06:27.358250 1 factory.go:225] "Error scheduling pod; retrying" err="binding rejected: running Bind plugin \"DefaultBinder\": Operation cannot be fulfilled on pods/binding \"cpu-stress-59f8597545-7bdrq\": pod cpu-stress-59f8597545-7bdrq is already assigned to node \"dev-node5\"" pod="crane-system/cpu-stress-59f8597545-7bdrq"
I1121 03:06:27.358258 1 factory.go:238] "Pod has been assigned to node. Abort adding it back to queue." pod="crane-system/cpu-stress-59f8597545-7bdrq" node="dev-node5"
LiaoSirui commented 10 months ago

遇到调度失败的问题,有大佬能给点排查思路吗

环境信息:

# helm list -n crane-system
NAME        NAMESPACE       REVISION    UPDATED                                 STATUS      CHART           APP VERSION
scheduler   crane-system    3           2023-11-21 10:12:59.63159177 +0800 CST  deployed    scheduler-0.2.2 0.2.2

# helm get values -n crane-system scheduler
USER-SUPPLIED VALUES:
controller:
  enable: true
  image:
    repository: dockerhub.bigquant.ai:5000/aipaas-devops/3rdparty/docker.io/gocrane/crane-scheduler-controller
    tag: 0.0.24
  name: crane-scheduler-controller
  replicaCount: 3
global:
  prometheusAddr: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
scheduler:
  enable: true
  image:
    repository: dockerhub.bigquant.ai:5000/aipaas-devops/3rdparty/docker.io/gocrane/crane-scheduler
    tag: 0.0.23
  name: crane-scheduler
  replicaCount: 3

# kubectl version
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10", GitCommit:"e770bdbb87cccdc2daa790ecd69f40cf4df3cc9d", GitTreeState:"clean", BuildDate:"2023-05-17T14:12:20Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.10", GitCommit:"e770bdbb87cccdc2daa790ecd69f40cf4df3cc9d", GitTreeState:"clean", BuildDate:"2023-05-17T14:06:35Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}
LiaoSirui commented 10 months ago

sheduler 不支持多副本

same issue:

https://github.com/gocrane/crane-scheduler/issues/28

mtt0 commented 6 months ago

若为替换方式安装,/etc/kubernetes/manifests/kube-scheduler.yaml 中设置 scheduler 命令行参数:--leader-elect=true,参考 https://github.com/gocrane/crane-scheduler/blob/c2c05338a5d75c0a6d92bd16a1cf257b48b30ef8/deploy/scheduler/deployment.yaml#L33