Open pen-pal opened 3 years ago
The GPU Share Scheduler Extender needs to change the scheduler configuration and the pod "gpushare-installer*" is used to change the scheduler configuration, the schedulers are hosted on master nodes usually. The reason of the pods are Pending is that not found master node in this cluster, you can use following command to make sure:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cn-beijing.192.168.8.44 Ready <none> 12d v1.16.9-aliyun.1
cn-beijing.192.168.8.45 Ready <none> 12d v1.16.9-aliyun.1
cn-beijing.192.168.8.46 Ready <none> 12d v1.16.9-aliyun.1
cn-beijing.192.168.9.159 Ready master 12d v1.16.9-aliyun.1
cn-beijing.192.168.9.160 Ready master 12d v1.16.9-aliyun.1
cn-beijing.192.168.9.161 Ready master 12d v1.16.9-aliyun.1
As you can see, the cluster has nodes whose role is "master". If you not found the master nodes, maybe the master nodes of the cluster are hosted on the another cluster, we call this cluster whose master nodes are hosted on another cluster as Managed Kubernetes Cluster in Alibaba Cloud.
You can get helps from the EKS and ask them how to enable a scheduler extender configuration for the scheduler.
@M-A-N-I-S-H-K, did you manage to solve it? I have the same issue with AKS.
Does the pods really need access to master nodes ? It looks like a scheduler-extender should work on EKS without access to a master node, this project does it: https://github.com/marccampbell/graviton-scheduler-extender
The scheduler can be deployed as a separate scheduler instead of modifying the default scheduler as done in https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/config/kube-scheduler.yaml#L18.
Instead of adding the config file to the master node, specify the scheduler configuration using a config map.
apiVersion: v1
kind: ConfigMap
metadata:
name: gpushare-schd-extender-config
namespace: kube-system
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
algorithmSource:
policy:
configMap:
namespace: kube-system
name: gpushare-schd-extender-policy
leaderElection:
leaderElect: true
lockObjectName: gpushare-schd-extender
lockObjectNamespace: kube-system
apiVersion: v1
kind: ConfigMap
metadata:
name: gpushare-schd-extender-policy
namespace: kube-system
data:
policy.cfg : |
{
"kind" : "Policy",
"apiVersion" : "v1",
"extenders" : [{
"urlPrefix": "http://127.0.0.1:32766/gpushare-scheduler",
"filterVerb": "filter",
"bindVerb": "bind",
"enableHttps": false,
"nodeCacheCapable": true,
"managedResources": [
{
"name": "aliyun.com/gpu-mem",
"ignoredByScheduler": false
}
],
"ignorable": false
}],
"hardPodAffinitySymmetricWeight" : 10
}
Mount the config map using volumes and deploy the new scheduler
spec:
volumes:
- name: gpushare-schd-extender-config
configMap:
name: gpushare-schd-extender-config
containers:
- name: connector
image: gcr.io/google-containers/kube-scheduler:v1.18.0
args:
- kube-scheduler
- --config=/gpushare-schd-extender/config.yaml
volumeMounts:
- name: gpushare-schd-extender-config
mountPath: /gpushare-schd-extender
Finally, specify the new scheduler in the pod manifest
pod.schedulerName: gpushare-schd-extender
Hi @animesh-agarwal , thank you very much for your reply and suggestion, it helps a lot. But I still cannot successfully use your method. Where should we we define the ConfigMap? it's the path /gpushare-schd-extender/config.yaml in your example? Where should we change this please pod.schedulerName: gpushare-schd-extender ?
@2811299
Please find below the complete manifest to add a new scheduler. Please note that I have used cluster-admin
cluster role for simplicity, you may choose to create a more specific role.
apiVersion: v1
kind: ServiceAccount
metadata:
name: gpushare-schd-extender
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpushare-schd-extender-kube-scheduler
subjects:
- kind: ServiceAccount
name: gpushare-schd-extender
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpushare-schd-extender-as-volume-scheduler
subjects:
- kind: ServiceAccount
name: gpushare-schd-extender
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gpushare-schd-extender-config
namespace: kube-system
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
algorithmSource:
policy:
configMap:
namespace: kube-system
name: gpushare-schd-extender-policy
leaderElection:
leaderElect: true
lockObjectName: gpushare-schd-extender
lockObjectNamespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gpushare-schd-extender-policy
namespace: kube-system
data:
policy.cfg : |
{
"kind" : "Policy",
"apiVersion" : "v1",
"extenders" : [{
"urlPrefix": "http://127.0.0.1:32766/gpushare-scheduler",
"filterVerb": "filter",
"bindVerb": "bind",
"enableHttps": false,
"nodeCacheCapable": true,
"managedResources": [
{
"name": "aliyun.com/gpu-mem",
"ignoredByScheduler": false
}
],
"ignorable": false
}],
"hardPodAffinitySymmetricWeight" : 10
}
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: gpushare-schd-extender
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: gpushare
component: gpushare-schd-extender
template:
metadata:
labels:
app: gpushare
component: gpushare-schd-extender
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccountName: gpushare-schd-extender
volumes:
- name: gpushare-schd-extender-config
configMap:
name: gpushare-schd-extender-config
containers:
- name: gpushare-schd-extender
image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
env:
- name: LOG_LEVEL
value: debug
- name: PORT
value: "12345"
- name: connector
image: gcr.io/google-containers/kube-scheduler:v1.18.0
args:
- kube-scheduler
- --config=/gpushare-schd-extender/config.yaml
volumeMounts:
- name: gpushare-schd-extender-config
mountPath: /gpushare-schd-extender
# service.yaml
---
apiVersion: v1
kind: Service
metadata:
name: gpushare-schd-extender
namespace: kube-system
labels:
app: gpushare
component: gpushare-schd-extender
spec:
type: NodePort
ports:
- port: 12345
name: http
targetPort: 12345
nodePort: 32766
selector:
# select app=ingress-nginx pods
app: gpushare
component: gpushare-schd-extender
Please note that the scheduler will be created inside the kube-system
namespace. You can verify if the scheduler pod is running using kubectl get pods --namespace=kube-system
Please follow this to understand how to use the newly deployed scheduler in your pods
@fernandocamargoai Hi, does this method works for you for AKS?
@fernandocamargoai Hi, does this method works for you for AKS?
I'm not actively working on that project anymore, but I sent them the link to this issue for them to try it in the future. When they try it and let me know, I'll comment here.
Is there anyone be able to verify animesh's method works for AKS?
@2811299 Please find below the complete manifest to add a new scheduler. Please note that I have used
cluster-admin
cluster role for simplicity, you may choose to create a more specific role.apiVersion: v1 kind: ServiceAccount metadata: name: gpushare-schd-extender namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gpushare-schd-extender-kube-scheduler subjects: - kind: ServiceAccount name: gpushare-schd-extender namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gpushare-schd-extender-as-volume-scheduler subjects: - kind: ServiceAccount name: gpushare-schd-extender namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io --- apiVersion: v1 kind: ConfigMap metadata: name: gpushare-schd-extender-config namespace: kube-system data: config.yaml: | apiVersion: kubescheduler.config.k8s.io/v1alpha1 kind: KubeSchedulerConfiguration algorithmSource: policy: configMap: namespace: kube-system name: gpushare-schd-extender-policy leaderElection: leaderElect: true lockObjectName: gpushare-schd-extender lockObjectNamespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: gpushare-schd-extender-policy namespace: kube-system data: policy.cfg : | { "kind" : "Policy", "apiVersion" : "v1", "extenders" : [{ "urlPrefix": "http://127.0.0.1:32766/gpushare-scheduler", "filterVerb": "filter", "bindVerb": "bind", "enableHttps": false, "nodeCacheCapable": true, "managedResources": [ { "name": "aliyun.com/gpu-mem", "ignoredByScheduler": false } ], "ignorable": false }], "hardPodAffinitySymmetricWeight" : 10 } --- kind: Deployment apiVersion: apps/v1 metadata: name: gpushare-schd-extender namespace: kube-system spec: replicas: 1 selector: matchLabels: app: gpushare component: gpushare-schd-extender template: metadata: labels: app: gpushare component: gpushare-schd-extender annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: serviceAccountName: gpushare-schd-extender volumes: - name: gpushare-schd-extender-config configMap: name: gpushare-schd-extender-config containers: - name: gpushare-schd-extender image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a env: - name: LOG_LEVEL value: debug - name: PORT value: "12345" - name: connector image: gcr.io/google-containers/kube-scheduler:v1.18.0 args: - kube-scheduler - --config=/gpushare-schd-extender/config.yaml volumeMounts: - name: gpushare-schd-extender-config mountPath: /gpushare-schd-extender # service.yaml --- apiVersion: v1 kind: Service metadata: name: gpushare-schd-extender namespace: kube-system labels: app: gpushare component: gpushare-schd-extender spec: type: NodePort ports: - port: 12345 name: http targetPort: 12345 nodePort: 32766 selector: # select app=ingress-nginx pods app: gpushare component: gpushare-schd-extender
Please note that the scheduler will be created inside the
kube-system
namespace. You can verify if the scheduler pod is running usingkubectl get pods --namespace=kube-system
Please follow this to understand how to use the newly deployed scheduler in your pods
Confirmed this works in EKS.
hello @mm-e1
I tried using the piece of yaml you mentionned on EKS with the default plugin deployed without any success:
I tried port 32766
without any luck and I switched over to 12345
Using the custom scheduler withing my pods I would end up in Pending state for ever.
Could you give me a bit more detail on how you proceeded with the installation?
Thanks Marius
@mariusehr1 The scheduler extender worked for me. Did you prep your nodes correctly? By labeling them with gpushare=true?
What is the name of the scheduler created @animesh-agarwal ? is there anywhere else except the pod manifest where I need to mention it? My pods show pending state and do not come up on mentioning schedulerName: gpushare-schd-extender
@animesh-agarwal since kubernetes v1.24, there has been a removel of scheduling policies are no longer supported instead scheduler configurations should be used. Hence the configuration you provided is not working, can you please help me out in setting this up in Kubernetes v1.23+? have tried using the new KubeSchedulerConfiguration by editing the configmap. The image has changed as well, and the pods do not come up. Any help would be appreciated
Hi! I have successfully deployed gpushare-scheduler-extender on KubernetesV1.23 or above in EKS. I have published the detailed steps here, hoping it will behelpful to you!
kubectl create -f https://gist.githubusercontent.com/YuuinIH/71b025b7e63291e6a7d5f3cc43e76805/raw/a1e530e03cc985891a33e8fc2ed2f26307061b0b/gpushare-schd-extender.yaml
kubectl create -f https://gist.githubusercontent.com/YuuinIH/71b025b7e63291e6a7d5f3cc43e76805/raw/2c5d874b6061e0497274779ab59ac2c240c4817a/gpushare-scheduler.yaml
kubectl edit clusterrole system:kube-scheduler
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-scheduler
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resourceNames:
- kube-scheduler
- gpushare-scheduler
resources:
- leases
verbs:
- get
- update
- apiGroups:
- ""
resourceNames:
- kube-scheduler
- gpushare-scheduler
resources:
- endpoints
verbs:
- delete
- get
- patch
- update
Here is the same as the official guide.
kubectl delete ds -n kube-system nvidia-device-plugin-daemonset
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-share-sample
spec:
parallelism: 1
template:
metadata:
labels:
app: gpu-share-sample
spec:
schedulerName: gpushare-scheduler #important!!!!!
containers:
- name: gpu-share-sample
image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
command:
- python
- tensorflow-sample-code/tfjob/docker/mnist/main.py
- --max_steps=100000
- --data_dir=tensorflow-sample-code/data
resources:
limits:
aliyun.com/gpu-mem: 3
workingDir: /root
restartPolicy: Never
Then, run inspector to show the GPU memory
❯ kubectl inspect cgpu
NAME IPADDRESS GPU0(Allocated/Total) GPU Memory(GiB)
ip-192-168-80-151.cn-northwest-1.compute.internal 192.168.80.151 0/15 0/15
ip-192-168-87-86.cn-northwest-1.compute.internal 192.168.87.86 3/15 3/15
-----------------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
3/30 (10%)
❯ kubectl logs gpu-share-sample-vrpsj --tail 1
2023-03-23 09:51:02.301985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Hi,
I tried to set this up in my
EKS
cluster, but I am observing the pods are in pending state and not running as expectedDescribing the pods
gpushare-installer-5s56q
Describing the pods
gpushare-schd-extender-846977f446-s9bxh
As per the documentation and going through the file
./templates/schd-config-job.yaml
, and./templates/gpushare-extender-deployment.yaml
I need to setup a label asnode-role.kubernetes.io/master: ""
fornode selector
.Also, this step by step guide https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md me to update the kubeschedular configuration.
On
EKS
, I am not sure where/how I can configure on which nodes should I update this configuration??Guidance will be much appericiated.