Open m1nish1208 opened 2 years ago
I have the same problem using minikube. The scheduler remains in PENDING state and so nothing works
Yea... Can someone suggest what I've missed.....
I was able to make some progress on this, all previously mentioned problems are fixed. I'm now getting following error while running a test POD binpack, here are the logs from gpushare-schd-extender POD:
[ debug ] 2022/02/09 17:06:31 controller.go:176: begin to sync gpushare pod binpack-1-5fb868d569-v6hp5 in ns default
[ debug ] 2022/02/09 17:06:31 cache.go:90: Add or update pod info: &Pod{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:binpack-1-5fb868d569-v6hp5,GenerateName:binpack-1-5fb868d569-,Namespace:default,SelfLink:/api/v1/namespaces/default/pods/binpack-1-5fb868d569-v6hp5,UID:dd5d948e-a702-4f2c-ac23-7fc89fe1250e,ResourceVersion:11119,Generation:0,CreationTimestamp:2022-02-09 17:06:31 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{app: binpack-1,pod-template-hash: 5fb868d569,},Annotations:map[string]string{},OwnerReferences:[{apps/v1 ReplicaSet binpack-1-5fb868d569 4bfe63bf-57e2-4d3c-a33c-6d51436fbbfc 0xc42004140a 0xc42004140b}],Finalizers:[],ClusterName:,Initializers:nil,},Spec:PodSpec{Volumes:[{kube-api-access-rkhwn {nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil ProjectedVolumeSource{Sources:[{nil nil nil ServiceAccountTokenProjection{Audience:,ExpirationSeconds:*3607,Path:token,}} {nil nil &ConfigMapProjection{LocalObjectReference:LocalObjectReference{Name:kube-root-ca.crt,},Items:[{ca.crt ca.crt <nil>}],Optional:nil,} nil} {nil &DownwardAPIProjection{Items:[{namespace ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,} nil <nil>}],} nil nil}],DefaultMode:*420,} nil nil nil}}],Containers:[{binpack-1 localhost:32000/cheyang/gpu-player:v2 [] [] [] [] [] {map[aliyun.com/gpu-mem:{{8192 0} {<nil>} 8192 DecimalSI}] map[aliyun.com/gpu-mem:{{8192 0} {<nil>} 8192 DecimalSI}]} [{kube-api-access-rkhwn true /var/run/secrets/kubernetes.io/serviceaccount <nil>}] [] nil nil nil /dev/termination-log File IfNotPresent nil false false false}],RestartPolicy:Always,TerminationGracePeriodSeconds:*30,ActiveDeadlineSeconds:nil,DNSPolicy:ClusterFirst,NodeSelector:map[string]string{},ServiceAccountName:default,DeprecatedServiceAccount:default,NodeName:,HostNetwork:false,HostPID:false,HostIPC:false,SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,RunAsGroup:nil,Sysctls:[],},ImagePullSecrets:[],Hostname:,Subdomain:,Affinity:nil,SchedulerName:default-scheduler,InitContainers:[],AutomountServiceAccountToken:nil,Tolerations:[{node.kubernetes.io/not-ready Exists NoExecute 0xc4200418b0} {node.kubernetes.io/unreachable Exists NoExecute 0xc4200418d0}],HostAliases:[],PriorityClassName:,Priority:*0,DNSConfig:nil,ShareProcessNamespace:nil,ReadinessGates:[],},Status:PodStatus{Phase:Pending,Conditions:[],Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[],QOSClass:BestEffort,InitContainerStatuses:[],NominatedNodeName:,},}
[ debug ] 2022/02/09 17:06:31 cache.go:91: Node map[]
[ debug ] 2022/02/09 17:06:31 cache.go:93: pod binpack-1-5fb868d569-v6hp5 in ns default is not assigned to any node, skip
[ info ] 2022/02/09 17:06:31 controller.go:223: end processNextWorkItem()
[ debug ] 2022/02/09 17:06:31 controller.go:295: No need to update pod name binpack-1-5fb868d569-v6hp5 in ns default and old status is Pending, new status is Pending; its old annotation map[] and new annotation map[]
[ info ] 2022/02/09 17:06:32 controller.go:210: begin processNextWorkItem()
[ debug ] 2022/02/09 17:06:46 controller.go:295: No need to update pod name binpack-1-5fb868d569-v6hp5 in ns default and old status is Pending, new status is Pending; its old annotation map[] and new annotation map[]
[ debug ] 2022/02/09 17:07:16 controller.go:295: No need to update pod name binpack-1-5fb868d569-v6hp5 in ns default and old status is Pending, new status is Pending; its old annotation map[] and new annotation map[]
What is that I'm missing now? Can anyone suggest?
Yea... Can someone suggest what I've missed.....
How to fixed pending state?
just delete nodeSelector in gpushare-schd-extender.yaml
How did you set this up on microk8s?
just delete nodeSelector in gpushare-schd-extender.yaml
Does removing nodeSelector affect usage, and on which node should this component run?
Hi,
I'm trying GPU scheduler, however POD gpushare-schd-extender is in PENDING state. My environment:
Output of POD describe:
Node Labels:
My docker configuration:
The device plugin POD doesn't seem to be functioning correctly, when I tried to get logs I got following error:
What is that I'm missing? Kindly suggest.