Open klvchen opened 2 weeks ago
这边是自建的K8S,版本是 v1.24.6,证书自己修改的是10年。 gpushare-device-plugin 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-plugin:v2-1.11-aff8a23 k8s-gpushare-schd-extender 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
今天更新一个服务,发现无法创建 pod ,用了官方的测试例子,也是报同样的问题 binding rejected: failed bind with extender at URL http://127.0.0.1:32766/gpushare-scheduler/bind, code 500
#使用的测试例子的yaml cat test.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: binpack-1 labels: app: binpack-1 spec: replicas: 2 serviceName: "binpack-1" podManagementPolicy: "Parallel" selector: # define how the deployment finds the pods it manages matchLabels: app: binpack-1 template: # define the pods specifications metadata: labels: app: binpack-1 spec: containers: - name: binpack-1 image: cheyang/gpu-player:v2 resources: limits: # GiB aliyun.com/gpu-mem: 1 # 无法启动后检查 kubectl describe pod binpack-1-0
查看了 kubectl -n kube-system get pod
gpushare-schd-extender-6cf7d6cdd9-nb4ph 这个 pod 里面有很多 Unauthorized 字眼,不知道是否跟这有关系
[ warn ] 2024/08/26 09:39:13 gpushare-bind.go:25: Failed to handle pod binpack-1-0 in ns default due to error Unauthorized [ info ] 2024/08/26 09:39:13 routes.go:137: extenderBindingResult = {"Error":"Unauthorized"} [ debug ] 2024/08/26 09:39:13 routes.go:162: /gpushare-scheduler/bind response=&{0xc420198780 0xc420395800 0xc42089f400 0x565b70 true false false false 0xc420d72740 {0xc420e0e540 map[Content-Type:[application/json]] false false} map[Content-Type:[application/json]] true 24 -1 500 false false [] 0 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0 0 0] [0 0 0] 0xc4203699d0 0} E0826 09:39:14.488506 1 reflector.go:205] github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Unauthorized E0826 09:39:14.489290 1 reflector.go:205] github.com/AliyunContainerService/gpushare-scheduler-extender/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Node: Unauthorized [ debug ] 2024/08/26 09:39:14 routes.go:160: /gpushare-scheduler/filter request body = &{0xc420627940 <nil> <nil> false true {0 0} false false false 0x69bfd0}
请问该如何解决这个问题,谢谢~
这边是自建的K8S,版本是 v1.24.6,证书自己修改的是10年。 gpushare-device-plugin 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-plugin:v2-1.11-aff8a23 k8s-gpushare-schd-extender 镜像是 registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
今天更新一个服务,发现无法创建 pod ,用了官方的测试例子,也是报同样的问题 binding rejected: failed bind with extender at URL http://127.0.0.1:32766/gpushare-scheduler/bind, code 500
查看了 kubectl -n kube-system get pod
gpushare-schd-extender-6cf7d6cdd9-nb4ph 这个 pod 里面有很多 Unauthorized 字眼,不知道是否跟这有关系
请问该如何解决这个问题,谢谢~